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GENE EXPRESSION PROFILES IN ESOPHAGEAL TISSUE 



INVENTORS: Amanda WILLIAMS, Joseph F. BOLAND, Reginald V, LORD, 
Chris ALVARES, Jon C. WETZEL, Uwe SCHERF, Joseph G. VOCKLEY 

BACKGROUND OF THE INVENTION 

There are two main types of esophageal cancer; squamous cell carcinoma (SCC) and 
adenocarcinoma. The worldwide incidence of esophageal SCC is higher than that of 
adenocarcinoma; however, in the last few decades, the incidence of adenocarcinoma in 
5 Western countries has been increasing at a dramatic rate. As a result, esophageal 
adenocarcinoma is the most common cancer type among Caucasian patients in some 
populations (Blot & McLaughlin, Semin. Oncol. (1999)26,2-8). 

The main risk factor for development of esophageal adenocarcinoma is the presence 
of Barrett's esophagus, a disease in which the normal squamous epithelium of the lower 

10 esophagus is replaced by columnar mucosa in response to injury caused by chronic 

gastroesophageal reflux (Lagergren et al, N. Engl. J. Med. (1999) 340, 825-831; Barrett et 
al, Nat Genet. (1999) 22, 106-109; Reid & Weinstein, Annu. Rev. Med. (1987)38, 
477-492). Barrett's esophagus is a disorder in which the lining of the esophagus undergoes 
cellular changes in response to chronic irritation and inflammation of reflux esophagitis. This 

1 5 condition is more common in men than women. The patient with Barrett's esophagus is at an 
increased risk of developing cancer of the esophagus. Symptoms are similar to those of 
reflux esophagitis and include heartburn, difficulty swallowing and pain relief with antiacid 
use or eating. The diagnosis of Barrett's is made by a biopsy of the esophageal mucosa 
through an endoscope. Treatment includes control of reflux disease, weight reduction and 

20 avoidance of alcohol, tobacco, fatty foods and lying flat after eating. Close follow-up is 
recommended to be certain the individual does not develop cancer of the esophagus. 

The precursor cell for Barrett's epithelium has not been identified, leaving the origin 
of Barrett's esophagus open to speculation. One theory suggests that denudation of the 
squamous epithelium layer by reflux acid allows gastric columnar cells to move into the site 

25 and take over (Bremner et al , Surgery (1970) 68, 209-16). More recently, cytokeratin 

expression data has been used to suggest that Barrett's epithelium evolves from a basal cell in 
the esophageal squamous epithelium (Boch et al 9 Gastroenterology (1997) 1 12, 760-765; 
Salo et al, Ann. Med. (1996) 28, 305-309). 
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The advent of cDNA and oligonucleotide arrays has enabled researchers to map 
tissue-specific expression levels for thousands of genes (Alon et aL, Proc. Natl. Acad. Sci. 
USA (1999) 96, 6745-6750; Iyer et aL, (1999) Science 283, 83-87; Khan et aL, Cancer Res. 
(1998) 58, 5009-13; Lee et aL, Science (1999) 285, 1390-1393; Wang et aL Gene (1999) 
5 229, 101-108; Whitney et aL, Ann. Neurol. (1999) 46, 425-428). Instead of assigning 
individual genes to a disease phenotype, expression profiles can be created which identify 
changes in total gene expression in the diseased tissue in relationship to normal adjacent 
tissue. Present day cancer research, particularly research in the field of adenocarcinoma, has 
focused on the determining the expression levels of individual genes with little effort 

10 expended on determining the global changes in gene expression that are correlated with the 
development and progression of adenocarcinoma. 

There remains a need in the art for materials and methods that permit a more accurate 
diagnosis of esophageal cancer and, in particular, esophageal adenocarcinoma. In addition, 
there remains a need in the art for methods to treat and methods to identify agent that can 

1 5 effectively treat esophageal cancer. The present invention meets these and other needs, 

SUMMARY OF THE INVENTION 

The present invention is based in part on the global changes in gene expression 
associated with esophageal cancer identified by examining gene expression in tissue from 

20 normal and diseased esophagus. The present invention also includes expression profiles 
which serve as useful diagnostic markers as well as markers that can be used to monitor 
disease states, disease progression, drug toxicity, drug efficacy and drug metabolism. 

The invention includes methods of diagnosing esophageal cancer in a patient 
comprising the step of detecting the level of expression in a tissue sample of two or more 

25 genes from Tables 2-8; wherein differential expression of the genes in Tables 2-8 is indicative 
of esophageal cancer. In some preferred embodiments, the method may include detecting the 
expression level of one or more genes selected from a group consisting of apolipoprotein C-l, 
galectin 4, keratin 18, annexin A10, cathepsin E, homeobox C10, MPP1, transglutaminase 1, 
aquaporin 3, trefoil peptidel, trefoil peptide 2 or mucin 5B. 

30 The invention also includes methods of detecting the progression of esophageal 

cancer. For instance, methods of the invention include detecting the progression of 
espphageal cancer in a patient comprising the step of detecting the level of expression in a 
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tissue sample of two or more genes from Tables 2-8; wherein differential expression of the 
genes in Tables 2-8 is indicative of esophageal cancer progression. In some preferred 
embodiments, the progression may be the progression of Barrett's esophagus to esophageal 
cancer. In some preferred embodiments, the method may include detecting the expression 
5 level of one or more genes selected from a group consisting of apolipoprotein C-l, galectin 
4, keratin 18, annexin A10, cathepsin E, homeobox CIO, MPP1, transglutaminase 1, 
aquaporin 3, trefoil peptidel, trefoil peptide 2 or mucin 5B. 

In some aspects, the present invention provides a method of monitoring the treatment 
of a patient with esophageal cancer, comprising administering a pharmaceutical composition 
10 to the patient and preparing a gene expression profile from a cell or tissue sample from the 
patient and comparing the patient gene expression profile to a gene expression from a cell 
population comprising normal esophageal cells or to a gene expression profile from a cell 
population comprising esophageal cancer cells or to both. In some preferred embodiments, 
the gene profile will include the expression level of one or more genes in Tables 2-8. In other 
1 5 preferred embodiments, one or more genes may be selected from a group consisting of 
apolipoprotein C-l, galectin 4, keratin 18, annexin A10, cathepsin E, homeobox CIO, 
MPP1, transglutaminase 1, aquaporin 3, trefoil peptidel, trefoil peptide 2 or mucin 5B. 

In another aspect, the present invention provides a method of treating a patient with 
esophageal cancer, comprising administering to the patient a pharmaceutical composition, 
20 wherein the composition alters the expression of at least one gene in Tables 2-8, preparing a 
gene expression profile from a cell or tissue sample from the patient comprising tumor cells 
and comparing the patient expression profile to a gene expression profile from an untreated 
cell population comprising esophageal cancer cells. 

In one aspect, the present invention provides a method of diagnosing esophageal 
25 adenocarcinoma in a patient, comprising detecting the level of expression in a tissue sample 
of two or more genes from Tables 2-8, wherein differential expression of the genes in Tables 
2-8 is indicative of esophageal adenocarcinoma. 

In another aspect, the present invention provides a method of detecting the 
progression of esophageal adenocarcinoma in a patient, comprising detecting the level of 
30 expression in a tissue sample of two or more genes from Tables 2-8; wherein differential 
expression of the genes in Tables 2-8 is indicative of esophageal adenocarcinoma 
progression. 
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The present invention also provides materials and methods for monitoring the 
treatment of a patient with a esophageal adenocarcinoma. The present invention provides a 
method of monitoring the treatment of a patient with esophageal adenocarcinoma, comprising 
administering a pharmaceutical composition to the patient, preparing a gene expression 
profile from a cell or tissue sample from the patient and comparing the patient gene 
expression profile to a gene expression from a.cell population comprising normal esophageal 
cells or to a gene expression profile from a cell population comprising esophageal 
adenocarcinoma cells or to both. In some preferred embodiments, the method may include 
detecting the level of expression of one or more genes selected from a group consisting of 
apolipoprotein C-l, galectin 4, keratin 18, annexin A10, cathepsin E, homeobox CIO, 
MPP1, transglutaminase 1, aquaporin 3, trefoil peptidel, trefoil peptide 2 or mucin 5B. 

In a related aspect, the present invention provides a method of treating a patient with 
esophageal adenocarcinoma, comprising administering to the patient a pharmaceutical 
composition, wherein the composition alters the expression of at least one gene in Tables 2-8, 
preparing a gene expression profile from a cell or tissue sample from the patient comprising 
esophageal adenocarcinoma cells and comparing the patient expression profile to a gene 
expression profile from an untreated cell population comprising esophageal adenocarcinoma 
cells. In some preferred embodiments, one or more genes may be selected from a group 
consisting of apolipoprotein C-l, galectin 4, keratin 18, annexin A10, cathepsin E, 
homeobox CIO, MPP1, transglutaminase 1, aquaporin 3, trefoil peptidel, trefoil peptide 2 or 
mucin 5B. 

The invention further includes methods of screening for an agent capable of 
modulating the onset or progression of esophageal cancer, comprising the steps of exposing a 
cell to the agent; and detecting the expression level of two or more genes from Tables 2-8. In 
some embodiments, the esophageal cancer may be an esophageal adenocarcinoma. In some 
preferred embodiments, one or more genes may be selected from a group consisting of 
apolipoprotein C-l, galectin 4, keratin 18, annexin A10, cathepsin E, homeobox CIO, 
MPP1, transglutaminase 1, aquaporin 3, trefoil peptidel, trefoil peptide 2 or mucin 5B. 
Preferred methods may detect all or nearly all of the genes in the tables. 

The invention fiirther includes compositions comprising at least two oligonucleotides, 
wherein each of the oligonucleotides comprises a sequence that specifically hybridizes to a 
gene in Tables 2-8 as well as solid supports comprising at least two probes, wherein each of 
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the probes comprises a sequence that specifically hybridizes to a gene in Tables 2-8. In some 
preferred embodiments, one or more genes may be selected from a group consisting of 
apolipoprotein C-l, galectin 4, keratin 18, annexin A10, cathepsin E, homeobox CIO, 
MPP1, transglutaminase 1, aquaporin 3, trefoil peptidel, trefoil peptide 2 or mucin 5B. 

5 The invention further includes computer systems comprising a database containing 

information identifying the expression level in esophageal tissue of a set of genes comprising 
at least two genes in Tables 2-8 and a user interface to view the information. In some 
preferred embodiments, one or more genes may be selected from a group consisting of 
apolipoprotein C-l, galectin 4, keratin 18, annexin A10, cathepsin E, homeobox CIO, 

10 MPP1, transglutaminase 1, aquaporin 3, trefoil peptidel, trefoil peptide 2 or mucin 5B. The 
database may further include sequence information for the genes, information identifying the 
expression level for the set of genes in normal esophageal tissue and cancerous tissue and 
may contain links to external databases such as GenBank. 

Lastly, the invention includes methods of using the databases, such as methods of 

1 5 using the disclosed computer systems to present information identifying the expression level 
in a tissue or cell of at least one gene in Tables 2-8, comprising the step of comparing the 
expression level of at least one gene in Tables 2-8 in the tissue or cell to the level of 
expression of the gene in the database. In some preferred embodiments, one or more genes 
may be selected from a group consisting of apolipoprotein C-l, galectin 4, keratin 18, 

20 annexin A10, cathepsin E, homeobox CIO, MPP1, transglutaminase 1, aquaporin 3, trefoil 
peptidel , trefoil peptide 2 or mucin 5B. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 shows the results of a cluster analysis. Figure la shows genes under 
25 expressed in BA while Figures lb, lc and Id show genes overexpressed in BA. 

Figure 2 shows the results of a cluster analysis. Figure 2a shows genes identified as 
markers for squamous epithelial cells. Figure 2b shows genes involved in extracellular 
matrix (ECM) modification. Figure 2c shows genes involved in cell adhesion, migration, 
proliferation and differentiation. 
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DETAILED DESCRIPTION 

Many biological functions are accomplished by altering the expression of various 
genes through transcriptional (e.g., through control of initiation, provision of RNA 
precursors, RNA processing, etc.) and/or translational control. For example, fundamental 
biological processes such as cell cycle, cell differentiation and cell death, are often 
characterized by the variations in the expression levels of groups of genes. 

Changes in gene expression also are associated with pathogenesis. For example, the 
lack of sufficient expression of functional tumor suppressor genes and/or the over expression 
of oncogene/protooncogenes could lead to tumorgenesis or hyperplastic growth of cells 
(Marshall, (1991) Cell, 64, 313-326; Weinberg, (1991) Science, 254, 1138-1146). Thus, 
changes in the expression levels of particular genes (e.g., oncogenes or tumor suppressors) 
serve as signposts for the presence and progression of various diseases. 

Monitoring changes in gene expression may also provide certain advantages during 
drug screening development. Often drugs are screened and prescreened for the ability to 
interact with a major target without regard to other effects the drugs have on cells. Often 
such other effects cause toxicity in the whole animal, which prevent the development and use 
of the potential drug. 

Applicants have examined tissue from normal esophageal tissue and tissue from 
esophageal tumors to identify global changes in gene expression between tumor biopsies and 
normal tissue. These global changes in gene expression, also referred to as expression 
profiles, provide useful markers for diagnostic uses as well as markers that can be used to 
monitor disease states, disease progression, drug toxicity, drug efficacy and drug metabolism. 

Expression profiles of genes in particular tissues, disease states or disease progression 
stages provide molecular tools for evaluating toxicity, drug efficacy, drug metabolism, 
development, and disease monitoring. Changes in the expression profile from a baseline 
profile can be used as an indication of such effects. Those skilled in the art can use any of a 
variety of known techniques to evaluate the expression of one or more of the genes and/or 
ESTs identified in the instant application in order to observe changes in the expression 
profile. 

The present application has identified differences in gene expression between normal 
esophageal tissue and esophageal adenocarcinoma. Barrett's epithelium was identified 
adjacent to many of the cancers. In some cases, the tumor involved an extensive area of 
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esophageal mucosa suggesting that it had overgrown the Barrett's epithelium from which it 
derived. Genes and ESTs have been found whose expression significantly varies (>3 fold 
change up or down) between normal and malignant tissue. In preferred embodiments, the 
expression level of one or more of these genes and/or ESTs can be determined using as 
interrogators probes specific to one or more of these genes and/or ESTs. This permits the 
determination of the expression pattern in unknown cells or samples and their identification 
as benign or malignant. The expression patterns of the genes and ESTs which were examined 
are listed in Tables 2-8. The complete sequences of the genes and ESTs are available from 
GenBank using the Accession numbers shown in each table. 

Definitions 

In the description that follows, numerous terms and phrases known to those skilled in 
the art are used. In the interest of clarity and consistency of interpretation, the definitions of 
certain terms and phrases are provided. 

The present invention provides compositions and methods to detect the level of 
expression of genes that may be differentially expressed dependent upon the state of the cell, 
Le. 9 normal versus cancerous. As used herein, the phrase "detecting the level expression" 
includes methods that quantify expression levels as well as methods that determine whether a 
gene of interest is expressed at all. Thus, an assay which provides a yes or no result without 
necessarily providing quantification of an amount of expression is an assay that requires 
"detecting the level of expression" as that phrase is used herein. 

As used herein, oligonucleotide sequences that are complementary to one or more of 
the genes described herein, refers to oligonucleotides that are capable of hybridizing under 
stringent conditions to at least part of the nucleotide sequence of said genes. Such 
hybridizable oligonucleotides will typically exhibit at least about 75% sequence identity at 
the nucleotide level to said genes, preferably about 80% or 85% sequence identity or more 
preferably about 90% or 95% or more sequence identity to said genes. 

"Bind(s) substantially" refers to complementary hybridization between a probe 
nucleic acid and a target nucleic acid and embraces minor mismatches that can be 
accommodated by reducing the stringency of the hybridization media to achieve the desired 
detection of the target polynucleotide sequence. 
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The terms "background" or background signal intensity" refer to hybridization 
signals resulting from non-specific binding, or other interactions, between the labeled target 
nucleic acids and components of the oligonucleotide array (e.g\, the oligonucleotide probes, 
control probes, the array substrate, etc.). Background signals may also be produced by 
intrinsic fluorescence of the array components themselves. A single background signal can 
be calculated for the entire array, or a different background signal may be calculated for each 
target nucleic acid. In a preferred embodiment, background is calculated as the average 
hybridization signal intensity for the lowest 5% to 10% of the probes in the array, or, where a 
different background signal is calculated for each target gene, for the lowest 5% to 10% of the 
probes for each gene. Of course, one of skill in the art will appreciate that where the probes 
to a particular gene hybridize well and thus appear to be specifically binding to a target 
sequence, they should not be used in a background signal calculation. Alternatively, 
background may be calculated as the average hybridization signal intensity produced by 
hybridization to probes that are not complementary to any sequence found in the sample (e.g., 
probes directed to nucleic acids of the opposite sense or to genes not found in the sample such 
as bacterial genes where the sample is mammalian nucleic acids). Background can also be 
calculated as the average signal intensity produced by regions of the array that lack any 
probes at all. 

The phrase "hybridizing specifically to" refers to the binding, duplexing or 
hybridizing of a molecule substantially to or only to a particular nucleotide sequence or 
sequences under stringent conditions when that sequence is present in a complex mixture 
(e.g., total cellular) DNA or RNA. 

Assays and methods of the invention may utilize available formats to simultaneously 
screen at least about 100, preferably about 1000, more preferably about 10,000 and most 
preferably about 1,000,000 or more different nucleic acid hybridizations. 

The terms "mismatch control" or "mismatch probe" refer to a probe whose sequence 
is deliberately selected not to be perfectly complementary to a particular target sequence. For 
each mismatch (MM) control in a high-density array there typically exists a corresponding 
perfect match (PM) probe that is perfectly complementary to the same particular target 
sequence. The mismatch may comprise one or more bases. 

While the mismatch(s) may be located anywhere in the mismatch probe, terminal 
mismatches are less desirable as a terminal mismatch is less likely to prevent hybridization of 



WO 01/074405 PCT/US01/09847 

9 

the target sequence. In a particularly preferred embodiment, the mismatch is located at or 
near the center of the probe such that the mismatch is most likely to destabilize the duplex 
with the target sequence under the test hybridization conditions. 

The term "perfect match probe" refers to a probe that has a sequence that is perfectly 
5 complementary to a particular target sequence. The test probe is typically perfectly 

complementary to a portion (subsequence) of the target sequence. The perfect match (PM) 
probe can be a 'test probe", a "normalization control" probe, an expression level control 
probe and the like. A perfect match control or perfect match probe is, however, distinguished 
from a "mismatch control" or "mismatch probe." 
10 As used herein a "probe" is defined as a nucleic acid, preferably an oligonucleotide, 

capable of binding to a target nucleic acid of complementary sequence through one or more 
types of chemical bonds, usually through complementary base pairing, usually through 
hydrogen bond formation. As used herein, a probe may include natural (i.e., A, G, U, C or T) 
or modified bases (7-deazaguanosine, inosine, etc.). In addition, the bases in probes may be 
1 5 joined by a linkage other than a phosphodiester bond, so long as.it does not interfere with 
hybridization. Thus, probes may be peptide nucleic acids in which the constituent bases are 
joined by peptide bonds rather than phosphodiester linkages. 

The term "stringent conditions" refers to conditions under which a probe will 
hybridize to its target subsequence, but with only insubstantial hybridization to other 
20 sequences or to other sequences such that the difference may be identified. Stringent 

conditions are sequence-dependent and will be different in different circumstances. Longer 
sequences hybridize specifically at higher temperatures. Generally, stringent conditions are 
selected to be about 5°C lower than the thermal melting point (Tm) for the specific sequence 
at a defined ionic strength and pH. 
25 Typically, stringent conditions will be those in which the salt concentration is at least 

about 0.01 to 1.0 M sodium ion concentration (or other salts) at pH 7.0 to 8.3 and the 
temperature is at least about 30°C for short probes (e.g., 10 to 50 nucleotide). Stringent 
conditions may also be achieved with the addition of destabilizing agents such as formamide. 
The percentage of sequence identity" or "sequence identity" is determined by 
30 comparing two optimally aligned sequences or subsequences over a comparison window or 
span, wherein the portion of the polynucleotide sequence in the comparison window may 
optionally comprise additions or deletions (i.e., gaps) as compared to the reference sequence 
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(which does not comprise additions or deletions) for optimal alignment of the two sequences. 
The percentage is calculated by determining the number of positions at which the identical 
subunit (e.g., nucleic acid base or amino acid residue) occurs in both sequences to yield the 
number of matched positions, dividing the number of matched positions by the total number 

5 of positions in the window of comparison and multiplying the result by 100 to yield the 
percentage of sequence identity. Percentage sequence identity when calculated using the 
programs GAP or BESTFIT (see below) is calculated using default gap weights. 

Homology or identity may be determined by BLAST (Basic Local Alignment Search 
Tool) analysis using the algorithm employed by the programs blastp, blastn, blastx, tblastn 

10 and tblastx (Karlin et al, (1990) Proc. Natl. Acad. Sci. USA 87, 2264-2268 and Altschul, 
(1993) J. Mol. Evol. 36, 290-300, fully incorporated by reference) which are tailored for 
sequence similarity searching. The approach used by the BLAST program is to first consider 
similar segments between a query sequence and a database sequence, then to evaluate the 
statistical significance of all matches that are identified and finally to summarize only those 

1 5 matches which satisfy a preselected threshold of significance. For a discussion of basic 

issues in similarity searching of sequence databases, see Altschul et al, ((1994) Nature Genet. 
6, 1 19-129) which is fully incorporated by reference. The search parameters for histogram, 
descriptions, alignments, expect (i.e., the statistical significance threshold for reporting 
matches against database sequences), cutoff, matrix and filter are at the default settings. The 

20 default scoring matrix used by blastp, blastx, tblastn, and tblastx is the BLOSUM62 matrix 
(Henikoffera/.,(1992)Proc. Natl. Acad. Sci. USA 89, 10915-10919, fully incorporated by 
reference). Four blastn parameters were adjusted as follows: Q=10 (gap creation penalty); 
R=10 (gap' extension penalty); wink=l (generates word bits at every wink* position along the 
query); and gapw=16 (sets the window width within which gapped alignments are generated). 

25 The equivalent Blastp parameter settings were Q=9; R=2; wink=l ; and gapw=32. A Bestflt 
comparison between sequences, available in the GCG package version 10.0, uses DNA 
parameters GAP=50 (gap creation penalty) and LEN=3 (gap extension penalty) and the 
equivalent settings in protein comparisons are GAP=8 and LEN=2. 

30 Uses of Differentially Expressed Genes 

The present invention identifies those genes differentially expressed between normal 
esophageal tissue and cancerous esophageal tissue. One of skill in the art can select one or 
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more of the genes identified as being differentially expressed and use the information and 
methods provided herein to interrogate or test a particular sample. For a particular 
interrogation of two conditions or sources, it is desirable to select those genes that display a 
great difference in the expression pattern between the two conditions or sources. At least a 
5 two-fold difference is desirable, but a three, five-fold or ten- fold difference may be preferred. 
Interrogations of the genes or proteins can be performed to yield information on gene 
expression as well as on the levels of the encoded proteins. 

Diagnostic Uses for the Esophageal Cancer Markers 

10 As described herein, the genes and gene expression information provided in Tables 2- 

8 may be used as diagnostic markers for the prediction or identification of the malignant state 
of the esophageal tissue. For instance, an esophageal tissue sample or other sample from a 
patient may be assayed by any of the methods known to those skilled in the art, and the 
expression levels from one or more genes from Tables 2-8, may be compared to the 

15 expression levels found in normal esophageal tissue, tissue from esophageal adenocarinoma 
or both. Expression profiles generated from the tissue or other sample that substantially 
resemble an expression profile from normal or diseased esophageal tissue may be used, for 
instance, to aid in disease diagnosis. Comparison of the expression data, as well as available 
sequence or other information may be done by researcher or diagnostician or may be done 

20 with the aid of a computer and databases as described herein. 

Use of the Esophageal Cancer Markers for Monitoring Disease Progression 

Molecular expression markers for esophageal cancer can be used to confirm the type 

and progression of the cancer made on the basis of morphological criteria. For example, 
25 squamous cell carcinoma could be distinguished from adenocarcinoma based on the level and 

type of genes expressed in a tissue sample. In some situations, identifications of cell type or 

source is ambiguous based on classical criteria. In these situations the molecular expression 

markers of the present invention are useful. 

In addition, progression of esophageal squamous cell carcinoma to adenocarcinoma 
30 can be monitored by following the expression patterns of the involved genes using the 

molecular expression markers of the present invention. Perturbed expression can be observed 



WO 01/074405 PCT7US0 1/09847 

12 

in the diseased state. Monitoring of the efficacy of certain drug regimens can also be 
accomplished by following the expression patterns of the molecular expression markers. 

Although only a few different disease progression time points have been observed, as 
shown in the examples below, other developmental stages can be studied using these same 
5 molecular expression markers. The importance of these markers in development has been 
shown here, however, variations in their expression may occur at other times. For example, 
one could study the expression of these markers at benign stages for comparison to 
expression at malignant states. 

As described above, the genes and gene expression information provided in Tables 2- 

10 8 may also be used as markers for the monitoring of disease progression, for instance, the 
development of esophageal cancer. For instance, an esophageal tissue sample or other sample 
from a patient may be assayed by any of the methods known to those of skill in the art, and 
the expression levels in the sample from a gene or genes from Tables 2-8 may be compared to 
the expression levels found in normal esophageal tissue, tissue from esophageal cancer, in 

1 5 particular, Barrett's-associated esophageal adenocarcinoma (B A), or both. Comparison of the 
expression data, as well as available sequence or other information may be done by researcher 
or diagnostician or may be done with the aid of a computer and databases as described herein. 

Use of the Esophageal Cancer Markers for Dnig Screening 

20 Potential drugs can be screened to determine if application of the drug alters the 

expression of one or more of the genes identified herein. This may be useful, for example, in 
determining whether a particular drug is effective in treating a particular patient or patient 
population with esophageal cancer. In the case where the expression of a gene is affected by 
the potential drug such that its level of expression returns to normal, the drug is indicated in 

25 the treatment of esophageal cancer. Similarly, a drug that causes expression of a gene which 
is not normally expressed by epithelial cells in the esophagus, may be contraindicated in the 
treatment of esophageal cancer. 

According to the present invention, the genes identified in Tables 2-8 may be used as 
markers to evaluate the effects of a candidate drug or agent on a cell, particularly a cell 

30 undergoing malignant transformation, for instance, an esophageal cancer cell or tissue 
sample. A candidate drug or agent can be screened for the ability to stimulate the 
transcription or expression of a given marker or markers (drug targets) or to down-regulate or 
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inhibit the transcription or expression of a marker or markers. According to the present 
invention, one can also compare the specificity of the effects of a drug by looking at the 
number of markers affected by the drug and comparing them to the number of markers 
affected by a different drug. A more specific drug will affect fewer transcriptional targets. 
5 Similar sets of markers identified for two drugs indicates a similarity of effects. 

Assays to monitor the expression of a marker or markers as defined in Tables 2-8 may 
utilize any available means of monitoring for changes in the expression level of the nucleic 
acids of the invention. As used herein, an agent is said to modulate the expression of a 
nucleic acid of the invention if it is capable of up- or down-regulating expression of the 

10 nucleic acid in a cell. 

Agents that are assayed in the above methods can be randomly selected or rationally 
selected or designed. As used herein, an agent is said to be randomly selected when the agent 
is chosen randomly without considering the specific sequences involved in the association of 
the a protein of the invention alone or with its associated substrates, binding partners, etc. An 
1 5 example of randomly selected agents is the use a chemical library or a peptide combinatorial 
library, or a growth broth of an organism. . 

As used herein, an agent is said to be rationally selected or designed when the agent is 
chosen on a nonrandom basis which takes into account the sequence of the target site and/or 
its conformation in connection with the agent's action. Agents can be rationally selected or 
20 rationally designed by utilizing the peptide sequences that make up these sites. For example, 
a rationally selected peptide agent can be a peptide whose amino acid sequence is identical to 
or a derivative of any functional consensus site. 

The agents of the present invention can be, as examples, peptides, small molecules, 
vitamin derivatives, as well as carbohydrates, lipids, oligonucleotides and covalent and non- 
25 covalent combinations thereof. Dominant negative proteins, DNA encoding these proteins, 
antibodies to these proteins, peptide fragments of these proteins or mimics of these proteins 
may be introduced into cells to affect function. "Mimic" as used herein refers to the 
modification of a region or several regions of a peptide molecule to provide a structure 
chemically different from the parent peptide but topographically and functionally similar to 
30 the parent peptide (see Grant, (1995) in Molecular Biology and Biotechnology Meyers 

(editor) VCH Publishers). A skilled artisan can readily recognize that there is no limit as to 
the structural nature of the agents of the present invention. 
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Assay Formats 

The genes identified as being differentially expressed in esophageal cancer may be 
used in a variety of nucleic acid detection assays to detect or quantify the expression level of 
a gene or multiple genes in a given sample. For example, traditional Northern blotting, 
5 nuclease protection, RT-PCR and differential display methods may be used for detecting gene 
expression levels. 

The protein products of the genes identified herein can also be assayed to determine 
the amount of expression. Methods for assaying for a protein include Western blot, 
immunoprecipitation, radioimmunoassay. It is preferred, however, that the mRNA be 
10 assayed as an indication of expression. Methods for assaying for mRNA include Northern 
blots, slot blots, dot blots, and hybridization to an ordered array of oligonucleotides. Any 
method for specifically and quantitatively measuring a specific protein or mRNA or DNA 
product can be used. However, methods and assays of the invention are most efficiently 
designed with array or chip hybridization-based methods for detecting the expression of a 
15 large number of genes. 

Any hybridization assay format may be used, including solution-based and solid 
support-based assay formats. A preferred solid support is a high density array also known as 
a DNA chip or a gene chip. In one assay format, gene chips containing probes to at least two 
genes from Tables 2-8 may be used to directly monitor or detect changes in gene expression 
20 in the treated or exposed cell as described herein. 

Additional assay formats may be used to monitor the ability of the agent to modulate 
the expression of a gene identified in Tables 2-8. For instance, as described above, mRNA 
expression may be monitored directly by hybridization of probes to the nucleic acids of the 
invention. Cell lines are exposed to an agent to be tested under appropriate conditions and 
25 time and total RNA or mRNA is isolated by standard procedures such those disclosed in 
Sambrook et aL, (1989) Molecular Cloning - A Laboratory Manual, Cold Spring Harbor 
Laboratory Press). In some embodiments, it may be desirable to amplify one or more of the 
RNA molecules isolated prior to application of the RNA to the gene chip. Using techniques 
well known in the art, the RNA may be reverse transcribed and amplified in the form of DNA 
30 or may be reverse transcribed into DNA and the DNA used as a template for transcription to 
generate recombinant RNA (rRNA). Any method that results in the production of a sufficient 
quantity of nucleic acid to be hybridized effectively to the gene chip may be used. 



WO 01/074405 



PCT/US01/09847 



15 

In another format, cell lines that contain reporter gene fusions between the open 
reading frame and/or the 3' or 5' regulatory regions of a gene in Tables 2-8 and any assayable 
fusion partner may be prepared. Numerous assayable fusion partners are known and readily 
available including the firefly luciferase gene and the gene encoding chloramphenicol 
5 acetyltransferase(Alame/a/.,(1990) Anal. Biochem. 188,245-254). Cell lines containing 
the reporter gene fusions are then exposed to the agent to be tested under appropriate 
conditions and time. Differential expression of the reporter gene between samples exposed to 
the agent and control samples identifies agents which modulate the expression of the nucleic 
acid. 

10 In another assay format, cells or cell lines are first identified which express one or 

more of the gene products of the invention physiologically. Cells and/or cell lines so 
identified would preferably comprise the necessary cellular machinery to ensure that the 
transcriptional and/or translational apparatus of the cells would faithfully mimic the response 
of normal or cancerous esophageal tissue to an exogenous agent. Such machinery would 

15 likely include appropriate surface transduction mechanisms and/or cytosolic factors. Such 
cell lines may be, but are not required to be, derived from esophageal tissue. The cells and/or 
cell lines may then be contacted with an agent and the expression of one or more of the genes 
of interest may then be assayed. The genes may be assayed at the mRNA level and/or at the 
protein level. 

20 In some embodiments, such cells or cell lines may be transduced or transfected with 

an expression vehicle (e.g., a plasmid or viral vector) containing an expression construct 
comprising an operable 5 '-promoter containing end of a gene of interest identified in Tables 
2-8 fused to one or more nucleic acid sequences encoding one or more antigenic fragments. 
The construct may comprise all or a portion of the coding sequence of the gene of interest 

25 which may be positioned 5*- or 3'- to a sequence encoding an antigenic fragment. The 

coding sequence of the gene of interest may be translated or un-translated after transcription 
of the gene fusion. At least one antigenic fragment may be translated. The antigenic 
fragments are selected so that the fragments are under the transcriptional control of the 
promoter of the gene of interest and are expressed in a fashion substantially similar to the 

30 expressipn pattern of the gene of interest. The antigenic fragments may be expressed as 
polypeptides whose molecular weight can be distinguished from the naturally occurring 
polypeptides. In some embodiments, gene products of the invention may further comprise an 
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immunologically distinct tag. Such a process is well known in the art (see Sambrook et al, 
(1989) Molecular Cloning - A Laboratory Manual, Cold Spring Harbor Laboratory Press). 

Cells or cell lines transduced or transfected as outlined above are then contacted with 
agents under appropriate conditions; for example, the agent comprises a pharmaceutically 
5 acceptable excipient and is contacted with cells comprised in an aqueous physiological buffer 
such as phosphate buffered saline (PBS) at physiological pH, Eagles balanced salt solution 
(BSS) at physiological pH, PBS or BSS comprising serum or conditioned media comprising 
PBS or BSS and serum incubated at 37°C. Said conditions may be modulated as deemed 
necessary by one of skill in the art. Subsequent to contacting the cells with the agent, said 
10 cells will be disrupted and the polypeptides of the lysate are fractionated such that a 
polypeptide fraction is pooled and contacted with an antibody to be further processed by 
immunological assay (e.g., ELISA, immunoprecipitatioh or Western blot). The pool of 
proteins isolated from the "agent-contacted" sample will be compared with a control sample 
where only the excipient is contacted with the cells and an increase or decrease in the 
1 5 immunologically generated signal from the "agent-contacted" sample compared to the control 
will be used to distinguish the effectiveness of the agent. 

Another embodiment of the present invention provides methods for identifying agents 
that modulate the levels, concentration or at least one activity of a protein(s) encoded by the 
genes in Tables 2-8. Such methods or assays may utilize any means of monitoring or 
20 detecting the desired activity. 

In one format, the relative amounts of a protein of the invention produced in a cell 
population that has been exposed to the agent to be tested may be compared to the amount 
produced in an un-exposed control cell population. In this format, probes such as specific 
antibodies are used to monitor the differential expression of the protein in the different cell 
25 populations. Cell lines or populations are exposed to the agent to be tested under appropriate 
conditions and time. Cellular lysates may be prepared from the exposed cell line or 
population and a control, unexposed cell line or population. The cellular lysates are then 
analyzed with the probe, such as a specific antibody. 

The genes and ESTs of the present invention may be assayed in any convenient form. 
30 For example, they may be assayed in the form mRNA or reverse transcribed mRNA. The 
genes may be cloned or not and the genes may be amplified or not. The cloning itself does 
not appear to bias the representation of genes within a population. However, it may be 
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preferable to use polyA+ RNA as a source, as it can be used with less processing steps. In 
some embodiments, it may be preferable to assay the protein or peptide encoded by the gene. 

The sequences of the expression marker genes are in the public databases. Tables 2-8 
provide the Accession numbers and name for each of the sequences. In Tables 2-6, the 
number following the notation gb= is the GenBank accession number. The sequences of the 
genes in GenBank are expressly incorporated by reference and are publicly available at, for 
example, www.ncbi.nih.gov. IMAGE gives the clone number from the IMAGE consortium. 

Probe design 

Probes based on the sequences of the genes described herein may be prepared by any 
commonly available method. Oligonucleotide probes for assaying the tissue or cell sample 
are preferably of sufficient length to specifically hybridize only to appropriate, 
complementary genes or transcripts. Typically the oligonucleotide probes will be at least 10, 
12, 14, 1 6, 1 8, 20 or 25 nucleotides in length. In some cases longer probes of at least 30, 40, 
or 50 nucleotides will be desirable. 

One of skill in the art will appreciate that an enormous number of array designs are 
suitable for the practice of this invention. The high density array will typically include a 
number of probes that specifically hybridize to the sequences of interest. See WO 99/32660 
for methods of producing probes for a given gene or genes. In addition, in a preferred 
embodiment, the array will include one or more control probes. 

High density array chips of the invention include "test probes." Test probes may be 
oligonucleotides that range from about 5 to about 500 or about 5 to about 50 nucleotides, 
more preferably from about 10 to about 40 nucleotides and most preferably from about 15 to 
about 40 nucleotides in length. In other particularly preferred embodiments, the probes are 
about 20 or 25 nucleotides in length. In another preferred embodiment, test probes are double 
or single strand DNA sequences. DNA sequences may be isolated or cloned from natural 
sources or amplified from natural sources using natural nucleic acid as templates. These 
probes have sequences complementary to particular subsequences of the genes whose 
expression they are designed to detect. Thus, the test probes are capable of specifically 
hybridizing to the target nucleic acid they are to detect. 

In addition to test probes that bind the target nucleic acid(s) of interest, the high 
density array can contain a number of control probes. The control probes fall into three 
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categories referred to herein as (1) normalization controls; (2). expression level controls; and 

(3) mismatch controls. 

Normalization controls are oligonucleotide or other nucleic acid probes that are 
complementary to labeled reference oligonucleotides or other nucleic acid sequences that are 
5 added to the nucleic acid sample. The signals obtained from the normalization controls after 
hybridization provide a control for variations in hybridization conditions, label intensity, 
"reading" efficiency and other factors that may cause the signal of a perfect hybridization to 
vary between arrays. In a preferred embodiment, signals (e.g., fluorescence intensity) read 
from all other probes in the array are divided by the signal (, fluorescence intensity) from the 
10 control probes thereby normalizing the measurements. 

Virtually any probe may serve as a normalization control. However, it is recognized 
that hybridization efficiency varies with base composition and probe length. Preferred 
normalization probes are selected to reflect the average length of the other probes present in 
the array, however, they can be selected to cover a range of lengths. The normalization 
15 control(s) can also be selected to reflect the (average) base composition of the other probes in 
the array, however in a preferred embodiment, only one or a few probes are used and they are 
selected such that they hybridize well (i.e., no secondary structure) and do not match any 
target-specific probes. 

Expression level controls are probes that hybridize specifically with constitutively 
20 expressed genes in the biological sample. Virtually any constitutively expressed gene 
provides a suitable target for expression level controls. Typical expression level control 
probes have sequences complementary to subsequences of constitutively expressed 
"housekeeping genes" including, but not limited to the p-actin gene, the transferrin receptor 
gene, the GAPDH gene, and the like. 
25 Mismatch controls may also be provided for the probes to the target genes, for 

expression level controls or for normalization controls. Mismatch controls are 
oligonucleotide probes or other nucleic acid probes identical to their corresponding test or 
control probes except for the presence of one or more mismatched bases. A mismatched 
base is a base selected so that it is not complementary to the corresponding base in the target 
30 sequence to which the probe would otherwise specifically hybridize. One or more 

mismatches are selected such that under appropriate hybridization conditions (eg., stringent 
conditions) the test or control probe would be expected to hybridize with its target sequence, 
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but the mismatch probe would not hybridize (or would hybridize to a significantly lesser 
extent). Preferred mismatch probes contain a central mismatch. Thus, for example, where a 
probe is a twenty-mer, a corresponding mismatch probe may have the identical sequence 
except for a single base mismatch (e.g., substituting a G, a C or a T for an A) at any of 

5 positions 6 through 14 (the central mismatch). 

Mismatch probes thus provide a control for non-specific binding or cross 
hybridization to a nucleic acid in the sample other than the target to which the probe is 
directed. Mismatch probes also indicate whether a hybridization is specific or not. For 
example, if the target is present the perfect match probes should be consistently brighter than 

10 the mismatch probes. In addition, if all central mismatches are present, the mismatch probes 
can be used to detect a mutation. The difference in intensity between the perfect match and 
the mismatch probe (I(pm> - I(mm)) provides a good measure of the concentration of the 
hybridized material. 



1 5 Nucleic Acid Samples 

As is apparent to one of ordinary skill in the art, nucleic acid samples used in the 
methods and assays of the invention may be prepared by any available method or process. 
Methods of isolating total mRNA are also well known to those of skill in the art. For 
example, methods of isolation and purification of nucleic acids are described in detail in 
20 Chapter 3 of Laboratory Techniques in Biochemistry and Molecular Biology: Hybridization 
With Nucleic Acid Probes, Part I Theory and Nucleic Acid Preparation, Tijssen, (1993) 
(editor) Elsevier Press. Such samples include RNA samples, but also include cDNA 
synthesized from a mRNA sample isolated from a cell or tissue of interest. Such samples 
also include DNA amplified from the cDNA and an RNA transcribed from the amplified 
25 DNA. One of skill in the art would appreciate that it may be desirable to inhibit or destroy 
RNase present in homogenates before homogenates can be used. 

Biological samples may be of any biological tissue or fluid or cells from any organism 
as well as cells raised in vitro, such as cell lines and tissue culture cells. Frequently the 
sample will be a "clinical sample" which is a sample derived from a patient. Typical clinical 
30 samples include, but are not limited to, sputum, blood, blood-cells (e.g., white cells), tissue or 
fine needle biopsy samples, urine, peritoneal fluid, and pleural fluid, or cells therefrom. 
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Biological samples may also include sections of tissues, such as frozen sections or 
formalin fixed sections taken for histological purposes. 

Solid Supports 

5 Solid supports containing oligonucleotide probes for differentially expressed genes 

can be any solid or semisolid support material known to those skilled in the art. Suitable 
examples include, but are not limited to, membranes, filters, tissue culture dishes, polyvinyl 
chloride dishes, beads, test strips, silicon or glass based chips and the like. Suitable glass 
wafers and hybridization methods are widely available, for example, those disclosed by 

10 Beattie (WO 95/1 1755). Any solid surface to which oligonucleotides can be bound, either 
directly or indirectly, either covalently or non-covalently, can be used. In some 
embodiments, it may be desirable to attach some oligonucleotides covalently and others non- 
covalently to the same solid support. 

A preferred solid support is a high density array or DNA chip. These contain a 

15 particular oligonucleotide probe in. a predetermined location on the array. Each 

predetermined location may contain more than one molecule of the probe, but each molecule 
within the predetermined location has an identical sequence. Such predetermined locations 
are termed features. There may be, for example, from 2, 10, 100, 1000 to 10,000, 100,000 or 
400,000 of such features on a single solid support. The solid support, or the area within 

20 which the probes are attached may be on the order of a square centimeter. 

Oligonucleotide probe arrays for expression monitoring can be made and used 
according to any techniques known in the art (see for example, Lockhart et al. 9 Nat. 
Biotechnol. (1996) 14, 1675-1680; McGallef al. 9 Proc. Nat. Acad. Sci. USA (1996) 93, 
13555-13460). Such probe arrays may contain at least two or more oligonucleotides that are 

25 complementary to or hybridize to two or more of the genes described herein. Such arrays my 
also contain oligonucleotides that are complementary or hybridize to at least 3, 4, 5, 6, 7, 8, 9, 
10, 20, 30, 50, 70 or more the genes described herein. 

Oligonucleotide arrays are particularly useful for creating gene expression profiles 
comparing cancer tissue to adjacent normal tissue. 

30 The use of available oligonucleotide arrays enabled the determination of the 

expression levels of numerous genes and ESTs simultaneously. From this mass of 
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expression data, differentially expressed genes were identified using Fold Change and Gene 
Signature Differential analysis. 

Gene Signature Differential analysis is a method designed to detect genes present in 
one sample set, and absent in another. Genes with differential expression in cancer tissue 
versus normal tissue are better diagnostic and therapeutic targets than genes that do not 
change in expression. 

Methods of forming high density arrays of oligonucleotides with a minimal number of 
synthetic steps are known. The oligonucleotide analogue array can be synthesized on a solid 
substrate by a variety of methods, including, but not limited to, light-directed chemical 
coupling, and mechanically directed coupling (see Pirrung et al, (1992) U.S. Patent No. 
5,143, 854; Fodor et al 9 (1998) U.S. Patent No. 5,800,992; Chee et al 9 (1998) 5,837,832 

In brief, the light-directed combinatorial synthesis of oligonucleotide arrays on a glass 
surface proceeds using automated phosphoramidite chemistry and chip masking techniques. 
In one specific implementation, a glass surface is derivatized with a silane reagent containing 
a functional group, e.g., a hydroxyl or amine group blocked by a photolabile protecting 
group. Photolysis through a photolithogaphic mask is used selectively to expose functional 
groups which are then ready to react with incoming 5' photoprotected nucleoside 
phosphoramidites. The phosphoramidites react only with those sites which are illuminated 
(and thus exposed by removal of the photolabile blocking group). Thus, the 
phosphoramidites only add to those areas selectively exposed from the preceding step. These 
steps are repeated until the desired array of sequences have been synthesized on the solid 
surface. Combinatorial synthesis of different oligonucleotide analogues at different locations 
on the array is determined by the pattern of illumination during synthesis and the order of 
addition of coupling reagents. 

In addition to the foregoing, additional methods which can be used to generate an 
array of oligonucleotides on a single substrate are described in Fodor et al 9 (1993). WO 
93/09668. High density nucleic acid arrays can also be fabricated by depositing premade or 
natural nucleic acids in predetermined positions. Synthesized or natural nucleic acids are 
deposited on specific locations of a substrate by light directed targeting and oligonucleotide 
directed targeting. Another embodiment uses a dispenser that moves from region to region to 
deposit nucleic acids in specific spots. 
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Hybridization 

Nucleic acid hybridization simply involves contacting a probe and target nucleic acid 
under conditions where the probe and its complementary target can form stable hybrid 
duplexes through complementary base pairing (see Lockhart et al, (1999) WO 99/32660). 

5 The nucleic acids that do not form hybrid duplexes are then washed away leaving the 

hybridized nucleic acids to be detected, typically through detection of an attached detectable 
label. It is generally recognized that nucleic acids are denatured by increasing the 
temperature or decreasing the salt concentration of the buffer containing the nucleic acids. 
Under low stringency conditions (e.g., low temperature and/or high salt) hybrid duplexes 

10 (e.g. 9 DNA-DNA, RNA-RNA or RNA-DNA) will form even where the annealed sequences 
are not perfectly complementary. Thus, specificity of hybridization is reduced at lower 
stringency. Conversely, at higher stringency (e.g., higher temperature or lower salt) 
successful hybridization requires fewer mismatches. One of skill in the art will appreciate 
that hybridization conditions may be selected to provide any degree of stringency. In a 

15 preferred embodiment, hybridization is performed at low stringency, in this case in 6x SSPE- 
T at 37°C (0.005% Triton x-100) to ensure hybridization and then subsequent washes are 
performed at higher stringency (e.g., lx SSPE-T at 37°C) to eliminate mismatched hybrid 
duplexes. Successive washes may be performed at increasingly higher stringency (e.g., down 
to as low as 0.25x SSPET at 37°C to 50°C until a desired level of hybridization specificity is 

20 obtained. Stringency can also be increased by addition of agents such as formamide. 

Hybridization specificity may be evaluated by comparison of hybridization to the test probes 
with hybridization to the various controls that can be present (e.g., expression level control, 
normalization control, mismatch controls, etc.). 

In general, there is a tradeoff between hybridization specificity (stringency) and signal 

25 intensity. Thus, in a preferred embodiment, the wash is performed at the highest stringency 
that produces consistent results and that provides a signal intensity greater than approximately 
1 0% of the background intensity. Thus, in a preferred embodiment, the hybridized array may 
be washed at successively higher stringency solutions and read between each wash. Analysis 
of the data sets thus produced will reveal a wash stringency above which the hybridization 

30 pattern is not appreciably altered and which provides adequate signal for the particular 
oligonucleotide probes of interest. 
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Signal Detection 

The hybridized nucleic acids are typically detected by detecting one or more labels 
attached to the sample nucleic acids. The labels may be incorporated by any of a number of 
means well known to those of skill in the art (see Lockhart et al., (1999) WO 99/32660). 

Databases 

The present invention includes relational databases containing sequence information, 
for instance for the genes of Tables 2-8, as well as gene expression information in various 
esophageal tissue samples. Databases may also contain information associated with a given 
sequence or tissue sample such as descriptive information about the gene associated with the 
sequence information, or descriptive information concerning the clinical status of the tissue 
sample, or the patient from which the sample was derived. The database may be designed to 
include different parts, for instance a sequences database and a gene expression database. 
Methods for the configuration and construction of such databases are widely available, for 
instance, see Akerblom et al, (1999) U.S. Patent 5,953,727, which is specifically 
incorporated herein by reference in its entirety. 

The databases of the invention may be linked to an outside or external database. In a 
preferred embodiment, as described in Tables 2-8, the external database is GenBank and the 
associated databases maintained by the National Center for Biotechnology Information 
(NCBI). 

Any appropriate computer platform may be used to perform the necessary 
comparisons between sequence information, gene expression information and any other 
information in the database or provided as an input. For example, a large number of 
computer workstations are available from a variety of manufacturers, such has those available 
from Silicon Graphics. Client-server environments, database servers and networks are also 
widely available and appropriate platforms for the databases of the invention. 

The databases of the invention may be used to produce, among other things, electronic 
Northerns to allow the user to determine the cell type or tissue in which a given gene is 
expressed and to allow determination of the abundance or expression level of a given gene in 
a particular tissue or cell. 

The databases of the invention may also be used to present information identifying the 
expression level in a tissue or cell of a set of genes comprising at least one gene in Tables 2-8 
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comprising the step of comparing the expression level of at least one gene in Tables 2-8 in the 
tissue to the level of expression of the gene in the database. Such methods may be used to 
predict the physiological state of a given tissue by comparing the level of expression of a 
gene or genes in Tables 2-8 from a sample to the expression levels found in tissue from 
normal esophageal tissue, tissue from esophageal adenocarcinoma or both. Such methods 
may also be used in the drug or agent screening assays as described herein. 

Without further description, it is believed that one of ordinary skill in the art can, 
using the preceding description and the following illustrative examples, make and utilize the 
compounds of the present invention and practice the claimed methods. The following 
working examples therefore, specifically point out the preferred embodiments of the present 
invention, and are not to be construed as limiting in any way the remainder of the disclosure. 

EXAMPLES 

Example 1: Tissue Sample Acquisition and Preparation 

For tissue specimens, nine normal esophagus samples and eight BA tissue samples, 
which included seven matched tumor-normal sets, were used. Six of the eight BA samples 
were lymph node invasive. 

With minor modifications, the sample preparation protocol followed the Afiymetrix 
GeneChip Expression Analysis Manual. Frozen tissue was first ground to powder using the 
Spex Certiprep 6800 Freezer Mill. Total RNA was then extracted using Trizol (Life 
Technologies). The total RNA yield for each sample (average tissue weight of 300 mg) was 
200-500 |ig. Next, mRNA was isolated using the Oligotex mRNA Midi kit (Qiagen). Since 
the mRNA was eluted in a final volume of 400 pi, an ethanol precipitation step was required 
to bring the concentration to 1 ng/^il. Using 1-5 ng of mRNA, double stranded cDNA was 
created using the Superscript Choice system (Gibco-BRL). First strand cDNA synthesis was 
primed with a T7-(dT 24 ) oligonucleotide. The cDNA was then phenol-chloroform extracted 
and ethanol precipitated to a final concentration of 1 jag/pl. 

From 2 \ig of cDNA, cRNA was synthesized according to standard procedures. To 
biotin label the cRNA, nucleotides Bio-1 1-CTP and Bio-16-UTP (Enzo Diagnostics) were 
added to the reaction. After a 37°C incubation for six hours, the labeled cRNA was cleaned 
up according to the RNeasy Mini kit protocol (Qiagen). The cRNA was then fragmented (5x 
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fragmentation buffer: 200 mM Tris-Acetate (pH 8.1), 500 mM KOAc, 150 mM MgOAc) for 
thirty-five minutes at 94°C. 

55 \ig of fragmented cRNA was hybridized on the human and the Human Genome 
U95 set of arrays for twenty-four hours at 60 rpm in a 45 °C hybridization oven. The chips 
5 were washed and stained with Streptavidin Phycoerythrin (SAPE) (Molecular Probes) in 
Affymetrix fluidics stations. To amplify staining, SAPE solution was added twice with an 
anti-streptavidin biotinylated antibody (Vector Laboratories) staining step in between. 
Hybridization to the probe arrays was detected by fluorometric scanning (Hewlett Packard 
Gene Array Scanner). Following hybridization and scanning, the microarray images were 
10 analyzed for quality control, looking for major chip defects or abnormalities in hybridization 
signal. After all chips passed QC, the data was analyzed using Affymetrix GeneChip 
software (v3.0), and Experimental Data Mining Tool (EDMT) software (vl.0). 

Example 2: Gene Expression Analysis 

15 All samples were prepared as described and hybridized onto the Affymetrix Human 

Genome U95 array set. 

Each chip contains 16-20 oligonucleotide probe pairs per gene or cDNA clone. These 
probe pairs include perfectly matched sets and mismatched sets, both of which are necessary 
for the calculation of the average difference. The average difference is a measure of the 

20 intensity difference for each probe pair, calculated by subtracting the intensity of the 

mismatch from, the intensity of the perfect match. This takes into consideration variability in 
hybridization among probe pairs and other hybridization artifacts that could affect the 
fluorescence intensities. Using the average difference value that has been calculated, the 
GeneChip software then makes an absolute call for each gene or EST. 

25 The absolute call of present, absent or marginal is used to generate a Gene Signature, 

a tool used to identify those genes that are commonly present or commonly absent in a given 
sample set, according to the absolute call. For each set of samples, a median average 
difference was figured using the average differences of each individual sample within the set. 
The Gene Signature for one set of samples is compared to the Gene Signature of another set 

30 of samples to determine the Gene Signature Differential. This comparison identifies the 

genes that are consistently present in one set of samples and consistently absent in the second 
set of samples. 
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The Gene Signature Curve is a graphic view of the number of genes consistently 
present in a given set of samples as the sample size increases, taking into account the genes 
commonly expressed among a particular set of samples, and discounting those genes whose 
expression is variable among those samples. The curve is also indicative of the number of 

5 samples necessary to generate an accurate Gene Signature. As the sample number increases, 
the number of genes common to the sample set decreases. The curve is generated using the 
positive Gene Signatures of the samples in question, determined by adding one sample at a 
time to the Gene Signature, beginning with the sample with the smallest number of present 
genes and adding samples in ascending order. The curve displays the sample size required 

10 for the most consistency and the least amount of expression variability from sample to 
sample. The point where this curve begins to level off represents the minimum number of 
samples required for the Gene Signature. Graphed on the x-axis is the number of samples in 
the set, and on the y-axis is the number of genes in the positive Gene Signature. As a general 
rule, the acceptable percent of variability in the number of positive genes between two sample 

15 sets should be less than 5%. 

Example 3: Expression Profiles 

Using the above described methods, genes that were predominantly over-expressed 
in B A, or predominantly under-expressed in B A were identified. The revealed genes were 

20 used to identify gene clusters generated by hierarchical clustering that exhibited a consistent 
fold change and/or dominant expression pattern between the normal and diseased sample sets. 
Genes with consistent differential expression patterns provide potential targets for broad 
range diagnostics and therapeutics. 

First, the expression profiles of the nine normal esophagus samples were pooled and 

25 used to determine the genes that are commonly expressed or commonly not expressed. To 
find the expression pattern consistent to disease, the same procedure was followed with the 
eight samples from patients with B A. Table 1 lists, by array type, the number of genes with 
expression patterns common to the majority of normal or diseased samples. 

Next, the unique pattern of genes over-expressed in the disease was identified by 

30 determining those genes that were commonly expressed in BA, but commonly NOT 

expressed in normal esophagus. Similarly, the unique pattern of genes under-expressed in 
disease was found by identifying genes that were expressed in the majority of normal 



WO 01/074405 



PCT7US01/09847 



esophagus samples, but NOT expressed in the majority of BA samples. Table 1 lists the 
number of genes uniquely under-expressed and over-expressed in B A by array type. With 
this method 423 genes were identified to be unique for BA. 

5 Example 4: Fold Change analysis 

The data was first filtered to exclude all genes that showed no expression in any of the 
samples. The ratio (tumor/normal) was calculated by comparing the mean expression value 
for each gene in the B A sample set against the mean expression value of that gene in the 
normal esophagus sample set. Genes were included in the analysis if they had a fold change 

10 > 3 in either direction, and a P value < 0.05 as determined by a two-tail unequal variance t- 
test. Out of the -60,000 genes surveyed by the Human Genome U95 set, 1584 genes were 
present in the overall fold change analysis, 701 were over-expressed in BA and 883 were 
under-expressed in BA. Out of the 423 unique genes for BA (244 under-expressed and 179 
over-expressed) previously identified, 170 were also present in the fold change analysis. 

15 Determining these 170 genes independently by both methods overcomes the limitations of 
accuracy inherent in either method. These 170 key disease-related genes have both 
significant overall fold changes, and 87 are not detectable in B A while the remaining 83 are 
not detectable in normal esophagus. 

The genes identified in the fold change analysis are listed in Tables 2-6. Table 2 lists 

20 those genes identified using the Human Genome U95A chip, Table 3 lists those genes 

identified using the Human Genome U95B chip, Table 4 lists those genes identified using the 
Human Genome U95C chip, Table 5 lists those genes identified using the Human Genome 
U95D chip and Table 6 lists those genes identified using the Human Genome U95E chip. 

25 Example 5: Cluster Analysis 

The data was first filtered to exclude all genes that showed no expression in any of the 
samples. To normalize the data, fold change values for the samples were calculated by 
dividing each gene expression value by the mean of the expression values for all samples, 
both normal esophagus and B A, for that gene. Genes were included in the cluster analysis if 

30 they had at least one instance of a fold change > 3 in either direction, and a P value of <0.05 
as assessed by a two-tail unequal variance t-test. Using a hierarchical clustering algorithm, 
genes were grouped according to their expression pattern similarities across all samples 
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(Eisen, et al, Cluster analysis and display of genome-wide expression patterns. Proc. Natl. 
Acad. Sci. USA 95, 14863-14868 (1998)). 

For the Human Genome U95A array, 1 100 full-length known genes or ESTs (8.7% of 
the genes present on the array) were included in the cluster analysis. The resulting 
dendrogram (Fig. 1) grouped all nine normal esophagus samples and seven of the eight BA 
samples into separate trees. BA sample 316 clustered in a branch with its matched normal 
esophagus sample (3 1 5) rather than with the other tumors. A number of genes on the Human 
Genome U95 A array are present in duplicate. In most cases the duplicate genes cluster next 
to each other or in close proximity of each other, verifying internal microarray 
reproducibility. Four clusters were chosen for in-depth analyses, based on the presence of a 
portion of the 170 key disease-related genes previously identified by our fingerprinting and 
fold change analysis methods. Figure 1 shows the results obtained using a hierarchical 
clustering to measure expression variation for 1 100 full-length genes present on the 
Affymetrix Human Genome U95 A oligonucleotide array. Four clusters (a-d) are presented 
that include genes from the 170 gene list identified by both our analysis methods. Those 
genes are labeled in red. Cluster (a) contains genes under-expressed in Barrett' s-associated 
esophageal adenocarcinoma (BA), while clusters (b-d) contain genes over-expressed in B A. 
The dendrogram summarizes the expression similarities between samples. Each gene is 
represented by a single row, and each sample by a single column. Relative to the mean 
expression level of all samples, red squares represent an over-expression, green squares 
represent an under-expression, black squares represent no expression change, and grey 
squares denote a missing sample. The overall fold change (FC), the fold change calculated 
between the two groups of samples, for each gene is also listed. 

Figure 2 shows the results obtained from a clustering analysis performed for 4,521 
genes from the Human Genome U95 array set. A representative cluster was chosen that 
contained a number of genes from the U95A (Figure 1 cluster d). Genes in common between 
clusters are labeled in green. Based on expression similarities to known genes, the biological 
function of ESTs can be determined. The genes thus identified are listed in Table 8. 

The cluster analysis also identified genes not identified in the fold change analysis. 
Table 7 provides a list of those genes identified as present in the U95A chip cluster analysis 
but not identified as present in the fold change analysis. 
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The clusters of genes thus identified contain genes that exhibit a consistent fold 
change between the normal and diseased sample sets, providing targets for broad range 
diagnostics and therapeutics. 

Example 6: Tissue markers 

As the progression from normal esophagus to BA occurs, squamous epithelial cells 
are replaced with a heterogeneous population of columnar cells that exhibit both intestinal 
and gastric-like characteristics. The methods of the present invention were used to identify 
clusters containing genes differentially expressed in all normal or diseased samples. The 
genes thus identified were screened for the presence of marker genes corresponding to gross 
morphological changes. 

The stratified squamous epithelial terminal differentiation markers, transglutaminase 
1, transglutaminase 3, involucrin, envoplakin, periplakin and sciellin were all present in the 
cluster of genes under-expressed in BA (Figure la). A distinct cluster (Figure lc) was also 
identified that included over-expressed genes associated with the Barrett's esophagus 
phenotype (see Labouvie, et a/., Differential expression of mucins and trefoil peptides in 
native epithelium, Barrett's metaplasia and squamous cell carcinoma of the oesophagus. J. 
Cancer Res. Clin. Oncol 125, 71-6 (1999) and Westerveld, et aL, Gastric proteases in 
Barrett's esophagus. Gastroenterology 93, 774-8 (1987)). The genes trefoil peptide 1(TFF- 
1), trefoil peptide 2 (TFF-2), mucin 5B, and pepsinogen C were present in this cluster. 

Example 7: Metastasis-related genes 

The majority of BA tumors in this study (6 out of 8) displayed regional lymph node 
invasion. Genes with expression changes that correlate highly with the metastatic phenotype 
are very valuable diagnostic markers. The first step in metastasis is the loss of cell adhesion 
at the primary site. Desmosomes are multi-component structures involved in epithelial cell to 
cell adhesion and intracellular anchoring of intermediate filaments. The desmosomal 
components, desmoglein 3, desmocollin 2, and desmoplakin, are all present in the cluster of 
genes under-expressed in BA (Figure la). 

Once cell to cell adhesion is broken, the extracellular matrix (ECM) must be breached 
to enable movement into metastatic sites. A number of proteases, including 
metalloproteinase 1 (MMP-1), metalloproteinase 11 (MMP-11), cathepsin E, cathepsin K, 
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and urokinase plasminogen activator (u-PA), that are involved in basement membrane and 
ECM degradation are spread throughout the clusters containing genes over-expressed in B A 
(Figures lb-Id). MMP-1, MMP-1 1, and u-PA expression has previously been correlated with 
metastasis and /or poor prognosis in esophageal carcinoma (see Murray, et aL, Matrix 
5 metalloproteinase-1 is associated with poor prognosis in oesophageal cancer. J. Pathol 185, 
256-61 (1998), Porte, et aL, Overexpression of stromelysin-3, BM-40/SPARC, and MET 
genes in human esophageal carcinoma: implications for prognosis. Clin. Cancer Res. 4, 1375- 
82 (1998) and Hewin, et aL, Plasminogen activators in oesophageal carcinoma. Br. J. Surg. 
83,1152-5(1996)). 

10 In parallel with the expression increase in ECM proteinases, an expression decrease 

was seen in a number of proteinase inhibitors, including squamous cell carcinoma antigen 1 
(SCCA1), squamous cell carcinoma antigen 2 (SCCA2), cystatin 6, and ELANH2 (Fig. 3 A, 
i). The loss of inhibitory proteinases may allow metastatic tumor progression to occur more 
rapidly. 

15 As the tumor moves through the stromal compartment into secondary sites, a balance 

must be reached between ECM degradation and renewal. The tumor requires the break down 
of ECM components to enable invasion, but the stromal environment must also be altered to 
create an environment with which the tumor can adhere and migrate. SPARC/osteonectin, 
SPPl/osteopontin, and thrombospondin-2 are secreted proteins involved in mediating cell to 

20 matrix interactions. These genes cluster together (Figure Id), and are over-expressed in BA. 
SPARC, SPP-1, and thrombospondin-l have previously been linked to oesophageal 
carcinoma (see Porte, et aL, supra, Casson, et aL, Ras mutation and expression of the ras- 
regulated genes osteopontin and cathepsin L in human esophageal cancer. Int. J. Cancer 72, 
739-45 (1997) and Oshiba,^ aL, Stromal thrombospondin-l expression is correlated with 

25 progression of esophageal squamous cell carcinoma. Anticancer Res. 19, 4375-8 (1999)). 

Further denoting the changes in the stromal environment, the ECM components, 
chondroitin sulfate proteoglycan 2, collagen type XI alpha 1, and collagen type X alpha 1, are 
also present in this cluster (Figure Id). 



30 



Example 8: Other Gene Clusters 

A number of additional clusters, besides those directly related to the metastatic 
process, have been identified. Reflecting a change in the tumor's transcriptional program, 
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one distinct cluster under-expressed in B A contained the homeobox genes, PITX1, PAX9 and 
BARX2 (Figure la). The homeobox gene, HOXB7, was over-expressed in BA (Figure lb). 
Homeobox genes are nuclear transcription factors that regulate development. Another cluster 
contained genes elicited by the body's anti-tumoral immune response (Figure lb). Two genes 
5 induced by interferon alpha and beta, IFI35 and IFI30, two genes induced by interferon 

gamma, ISG15 and GEP3, and interferon-induced complement component C2 were present in 
a cluster over-expressed in BA (Figure lb). Natural killer transcript 4 (NK4) also clusters 
with these genes. 



10 Example 9: EST Clustering 

Clustering was performed for the full Human Genome U95 set. After filtering, 4521 
genes (7.5% of the genes present on all 5 arrays) were analyzed via hierarchical clustering 
and the results are shown in Figure 2. A list of the genes thus identified is provided in Table 
8. A hierarchical clustering was used to measure expression variation for 4,521 known genes 

15 or ESTs from the Affymetrix HG-U95 array set. Three clusters are shown that include genes 
from the HG-U95A analysis (see Figure 1). Genes in common between clusters are labeled 
in green. The dendrogram summarizes expression similarities between samples. Each gene 
and sample presentation is the same as in Figure 1. The overall fold change (FC), fold 
change between the groups of tissue samples, are also listed for each gene. Based on 

20 expression similarities to known genes, the biological function of some EST's can be 
assigned. Cluster A represents a number of marker genes for squamous epithelial cells. 
ESTs grouped around these genes are novel diagnostic markers whose expression loss 
follows BA progression. Cluster B represents a number of genes involved in ECM 
modification. Cluster C represents genes involved in cell adhesion, migration, proliferation 

25 and differentiation. Interestingly, EST AA877900 clusters around the cell surface protein 
encoded by tetraspanins and shows homology to the mouse cell surface antigen 1 14/A10 
precursor. The resulting dendrogram grouped all nine normal oesophagus and all eight BA 
samples into separate trees. Figure 2 shows the incorporation of these additional genes, 
consisting primarily of ESTs, into the Human Genome U95A cluster (Figure 1). The U95A 

30 cluster contained a number of proteins involved in extracellular matrix modification and 
structure. Based on expression similarities to known genes, the biological function of 
surrounding ESTs can be postulated. Supporting this theory, the extracellular matrix 
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proteins, collagen type V alpha 2, biglycan, and SPP1 (osteopontin) are represented in the 
new cluster (Figure 2). 

The present invention provides methods to identify genes and ESTs that are 
differentially expressed in normal and cancerous esophageal tissue. The method entails using 
several tissues of the same disease type to identify the gene expression patterns that are 
unique to normal and diseased tissues, comparing these patterns to determine the expression 
patterns that uniquely identify the disease, and performing fold change analysis to discover 
which genes are the most important determinants of disease. Applying the method, 
Applicants have identified key disease-related genes, and furthermore demonstrate that these 
weighted genes, can be used to identify significant clusters generated by hierarchical 
clustering algorithms. This overall approach, can potentially determine novel targets for 
diagnostic and therapeutic intervention in a wide variety of tissues, as demonstrated here with 
BA. 

Although the present invention has been described in detail with reference to 
examples above, it is understood that various modifications can be made without departing 
from the spirit of the invention. Accordingly, the invention is limited only by the following 
claims. All cited patents and publications referred to in this application are herein 
incorporated by reference in their entirety. 
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P-value 


0.000442 


0.00302 


0.003285 


0.001197 


0.013189 


0.001927 


0.001025 


0.004424 


0.004589 


0.000004 


0.00009 


0.004106 


0.008281 


0.00289 


Fold Change 


4.699694547 


4.685097403 


4.663360742 


4.605593005 


4.577288892 


4.508059219 


4.461870453 


4.443984215 


4.381967582 


4.354613332 


4.340360862 


4.319799905 


4.254851949 


4.238866373 


Table 3. U95_B Fold Change Genes (>3 over-expressed in Barrett's 
associated esophageal adenocarcinoma (BA), <0.33 under-expressed in B A) 


Cluster Incl. AI341 166:qx89h02.x1 Homo sapiens cDNA, 3 end /clone=IMAGE-2009715 /clone_ena-4 

/«k=ai^i 1 rr /ni=4n7ftn93 /ua=Hs.233977 /len=257' 


Cluster Incl. AI557210:PT2.1_14_H10.r Homo sapiens cDNA, 3 end /clone_end-3' /go-Aiso/znu /gi-wwo/o 

/..n=Hc 41971 /Ipn=8fi7* 


Cluster Incl. W74476:zd75a1 1 .s1 Homo sapiens cDNA, 3 end /clone=IMAGE-34646U /aoneena-*' /go-vv/^/o 

/«l=^ft47fiq/iin=H<s 19RR0 /len=675' 


Cluster Incl. AI979261:wr72g05.x1 Homo sapiens cDNA, 3 end /clone=IMAGE-2493272 /clone_ena-d 
ffih-/\iq7P?fi1 /ni=5804?80 /nn=H« 102720 /len=809' 


Cluster Incl. AI670876wa06c12.x1 Homo sapiens cDNA, 3 end /clone-IMAGE-2297302 /clone_ena-d 
/«-ih=AiR7nR7fi /ni=4fi50607 /ua=Hs.44276 /len=798' „ : 


Cluster Incl. AI669308:wb85b10.x1 Homo sapiens cDNA, 3 end /clone-IMAGE-2312443 /donejincw 
/nh=AiRfiQ^nft /ni=4R34082 /ua=Hs.1 96337 /len=632 l 


Cluster Incl. AI631355:tz83d05.x1 Homo sapiens cDNA, 3 end /clone=IMAGE-2295177 /clone^ena-^ 

/^k=air^i /ni=4fift2fi85 /uo=Hs.92096 /len=390' ■_ 


Cluster Incl. AI675453:wb99f04.x1 Homo sapiens cDNA, 3 end /clone=IMAGE-2313823 /clone_ena-J 
/nh=AiR7SA53 /ni=4875933 /ua=Hs.21 432 /len=562* 


Cluster Incl: AI703454we24d09.x1 Homo sapiens cDNA, 3 end /clone-IMAGE-234*«d /cione_ena * 
/«h-Ai7n^4«;/i /ni=4QQ1 354 Aia=Hs.26176 /len=567" 


Cluster Incl. N20945:yx54f12.s1 Homo sapiens cDNA, 3 end /clone=IMAGE-2655b3 /cione_ena=o ign-rwuaw 
/r,i=i 1 9R1 1 s /. m=Hs 1 2210 /len=621 ' — 


Cluster Incl. AI610692:tp40f03.x1 Homo sapiens cDNA, 3 end /clone=IMAGE-21902b9 /cione_eno=* 

/«k= airi nRQ9 /ni=4R1 9859 /ua=Hs.23441 2 /len=474' _ 


Cluster Incl. AI860751 :wl05b07.x1 Homo sapiens cDNA, 3 end /clone=IMAGb-24*5yay yaone_end-3 
/„h=AiARn7Ri /nis:Sfil4367 /ua=Hs. 182476 /len=637' _ _ 


Cluster Incl. AA447232:zw93a05.r1 Homo sapiens cDNA, 5 end /clone=IMAGE-784496 /clone_ena-o 
/nh-A /ni=21 59897 /ua=Hs.34806 /len=580' 


Cluster Incl. AA621 124:af34f07.s1 Homo sapiens cDNA, 3 end /clone=IMAGE-10^t>M /cione_end-3 
/gb=AA621 124 /gi=2525063 /ug=Hs.93135 /len=639' 
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P-value 
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0.000913 
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Fold Change 
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5.528854288 


5.400844754 


Table 5. U95 J> Fold Change Genes (>3 over-expressed in Barrett's 
associated esophageal adenocarcinoma (BA), <0.33 under-expressed in BA) 
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P-value 
■ 1 


0.004595 


0.009315 
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0.001649 
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0.000543 
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Fold Change 
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Table 5. U95_D Fold Change Genes (>3 over-expressed in Barrett's 
associated esophageal adenocarcinoma (BA), <0.33 under-expressed in BA) 

AfiV in fipne Name — 
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P-value 


0.001388 


0.000022 


0.001615 


0.000035 


0.02766 


0.001194 


0.001194 


0.00004 


0.007339 


0.002798 


0.000081 


0.001955 


0.037893 


0.000162 


Fold Change 

r 


0.209176341 


0.208973443 


0.208676712 


0.207370287 


0.20599295 


0.204116157 


0.198979696 


0.198366813 


0.197923908 


0.195398279 


0.193631412 


0.193436596 


0.190979429 


0.184683765 


TableS, U95J) Fold Change Genes (>3 over-expressed in Barrett's 
associated esophageal adenocarcinoma (BA), <0.33 under-expressed in BA) 
Affv ID Gene N?"ia — : _ — . . 


Cluster Incl. AI053597:qi72e03.x1 Homo sapiens cDNA, 3 end /clone=IMAGE-l862U44 /cione_ena^ 
/nh=AinMRQ7 /ni-3321384 /ua=Hs.133138 /len=546' 


Cluster Incl. AA367838:EST79039 Homo sapiens cDNA /clone=ATCC-1 72567 /go=AA30f wo /gi-2u*0 1 9b 
/. .n=H R 1 fttR34 /len=374 


Cluster Incl. AI865729:wk50e02.x1 Homo sapiens cDNA, 3 end /done=IMAGE-24iBB4Z /cione_end-3 

/nh-AIRK!v79Q /ni=fifi99fl36 h .n=Hft 22QR00 /| e n=283" 


Cluster Incl. AI440266:tj01e04.x1 Homo sapiens cDNA, 3 end /clone-IMAGE-2140254 /clone_ena-3 
/nh=AI440266 /ai=4281451 /ug=Hs.170673 /Ien=461' 


Cluster Incl. AA928646:om75f04.s1 Homo sapiens cDNA, 3 end /clone=IMAGE-l553U^ /cione_eno->* 
/nh=AA928646 /ai=3076937 /uq=Hs.234976 /len=488' 


Cluster Inci. AI052543:oz27h04.x1 Homo sapiens cDNA, 3 end /clone=IMAGE-1676599 /clone_ena-4 
/nh=Ainw^^ /ni=^an8534 /ua=Hs.1 33244 /len=452' 


Cluster IncL AA977896:oq62b04.s1 Homo sapiens cDNA, 3 end /clone=IMAGE-1 590895 /clone_ena-J 
/nh= a AQ77RQR /ni=31 55342 /ua=Hs. 1 28873 /len=41 T 


Cluster IncL AI640523:wa29b01 .x1 Homo sapiens cDNA, 3 end /clone=lMAGE-2299465 /clone_ena-J 
/nh-Aifun*?* /ni=4703632 /ua=Hs.223553 /len=442' 
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Cluster Incl. AI379425:tc66g04.x1 Homo sapiens cDNA, 3 end /done=IMAGE-2069b22 /cione_ena-o 
/ah- Al*7<>4?* /0i=41 89278 /uq=Hs.1 60942 /len=472' 


Cluster Incl. AI610910:tt60a11.x1 Homo sapiens cDNA, 3 end /clone-IMAGE-2245i4cwaone_end-o 
/nh-AiRmQin /ni=4620077 /ua=Hs.T75357 /len=122* 


Cluster IncL AI371042:ta29f06.x1 Homo sapiens cDNA, 3 end /clone=IMAGE-2045507 /clone_ena-vi 
/nh-AI^71H4^ /ni=414Q795 /uq=Hs.160911 /len=484' 


Cluster Incl. AI650477:wa91d08.x1 Homo sapiens cDNA, 3 end /clone=IMAGE-2303535 /clone_ena-3 
/nh-AI650477 fa\=A7*ddtt /uq=Hs.1 97758 /len=478' 


Cluster IncL AI703361 :wd93d02.x1 Homo sapiens cDNA, 3 end /clone=IMAGE-2339139 /cIone_ena-3 
/gb=AI703361 /gi=4991261 /ug=Hs.202354 /len=530' 
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P-value 
1 


0.0023 


0.001296 


0.002115 


0.010479 


0.029254 


0.000004 


0.000324 


0.002275 


0.000178 


0.003363 


0.000049 


0.000066 


0.000207 


0.000971 


Fold Change 


0.182835726 


0.181763269 


0.17951496 


0.17343166 


i 

CM 
CO 
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j 0.170191727 


0.169417008 


0.164920143 


0.161975336 


0.15021849 


0.142318725 


0.138942719 


0.134163985 


0.132765609 


Table 5. U95 J) Fold Change Genes (>3 over-expressed in Barrett's 
associated esophageal adenocarcinoma (BA), <0.33 under-expressed in BA) 

Affwin Ram** Name 


Cluster Incl. AA994249:ou05b11.s1 Homo sapiens cDNA, 3 end /clone=IMAGE-1 625373 /clone_ena-,5 
/ n h-AA994749 /qi-3180794 /uq=Hs.129479 /len=414' 


Cluster Incl. AI375662:tc09c10.x1 Homo sapiens cDNA, 3 end /clone=IMAGE-2063346 /ctone_end-3 j 
/nh-A^7f;fifi9 /ni=41 75652 /uq=Hs.232023 /len^S 1 


Cluster Incl. AA552017:ng01g11.s1 Homo sapiens cDNA, 3 end /clone=IMAGE-928iab /cione_ena~o 
/nh=AA55201 7 /ai~2322269 /ua=Hs.1 62245 /len=460' 


Cluster Incl. AA721234:nz72b08.s1 Homo sapiens cDNA /done= I MAG E-1 300983 /gb-AA7^i2^4 
/ni=97373fi9 /na=Hs.121 121 /len=345 1 


Cluster Incl. AI360231:qy84d11.x1 Homo sapiens cDNA, 3 end /done=IMAGE-20lbVuy /cione_end-3 
/nh=ARfin?^i /ni=41 1 1852 /ua=Hs.170245 /len=485 i 


Cluster Incl. AI821803:nr20b05.x5 Homo sapiens cDNA/clone=IMAGE-1 168497 /gb=AI82lbTO /gi-wuww 
/i m=H« 1 36580 /len=307 _ 


Cluster Incl. AI436290:th81c01 .x1 Homo sapiens cDNA, 3 end /clone=IMAGE-21 25056 /clone_end-d 
/ nh -A|4 36? 9n /ni=4309151 /nn=Hs.164162 /len=497' 


Cluster Incl. AI469896:tj88c1 1 .x1 Homo sapiens cDNA, 3 end /clone=IMAGE-2148596 /done_end-3 
/nh=AI469896 /ai=4331986 /ug=Hs.158866 /len=459' 


Cluster Incl. AI377752:te56h12.x1 Homo sapiens cDNA, 3 end /clone=IMAGE-2090759 /clone_end-3 
/nb-AI3777fi? /nMl 87605 /uq=Hs. 129448 /len=429' 


Cluster IncL AA843562:aj54f01.s1 Homo sapiens cDNA, 3 end /clone=IMAGE-1394137 /clone_end-3 

/nh:=AAfl435fi2 /ni=?<M 0080 /uq=Hs.1 63277 /len=435' 


Cluster IncL Al739473:wi14a05.xt Homo sapiens cDNA, 3 end /clone=IMAGE-2390192 /clone_end-3 
/nh=A!7^P47S /qi=5101454 /uq=Hs.233630 /1en=463' 


Cluster tncl. AA700621 :zi43a01 .s1 Homo sapiens cDNA, 3 end /clone=433512 /clone_end=3 /gD=AA/uuo<n 
/rti=o7fy*5fU /un=Hs.1 88964 /len=519' 


Cluster Incl. AI701529:we36a05.x1 Homo sapiens cDNA, 3 end /clone=IMAGE-2343152 /clone_end-d 
/nh-AI701 529 /ni=4QRQ4?Q Aiq=Hs.1 2951 9 /len=507 l 


Cluster Incl. AW025687:wu07b01.x1 Homo sapiens cDNA, 3 end /clone=IMAGE-2516233 /clone_end-3 
/gb=AW025687 /gi=587921 7 /ug=Hs.1 56452 /len=408' 
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P-value 


0.020207 


0.011834 


0.009528 


0.000041 


0.001223 


0.013059 

I : 


0.016554 


0.003186 


0.008956 


0.000033 


0.002982 


0.002743 


0.031896 


0.003436 


Fold Change 


3.300123522 


3.296015126 


3.287284047 


3.283264496 


3.270101441 


3.268418769 


3.264815022 


3.258903714 


3.254465131 


3.238426465 


3.237963905 


3.22255218 


3.218521352 


3.213809716 


Table 6, U95_E Fold Change Genes (>3 over^expressed in Barrett's 
associated esophageal adenocarcinoma (BA), «U3 under-expressed in BA) 

Affwin Gpnft Name - 


Cluster Incl. AI934965:wd17a03.x1 Homo sapiens cDNA, 3 end /clone-IMAGE-2328364 /clone_end-4 | 
/nh-A19349Bfr /ni=5673835 A in=Hs 1 81 261 /len=588' 


Cluster Incl. AI686521 :tu34g02.x1 Homo sapiens cDNA, 3 end /clone-IMAGh-^i^y/o /cione_end-3 
/nh— airrrr91 /ni=4R9781 5 /ua=Hs.1 941 1 8 /len=597' 


Cluster Incl. AI082708:ox59f01 .s1 Homo sapiens cDNA, 3 end /clone=IMAGE-1 660633 /clone_eno-3 | 
/nh-AI0R?70R /ni=3A19500 /un=Hs.31588 /len=369' 


Cluster Incl. H18887:yn52d11.s1 Homo sapiens cDNA, 3 end /clone=IMAGE-l 7^uw /cione_end-.i 
/nK-Hiftft ft7 /ni=AftS1P7 /ua=Hs.181836 /len=40V 


Cluster Incl. Al632972:tx55h08.x1 Homo sapiens cDNA, 3 end /clone=IMAGE-2273535 /clone_ena-4 
/ rt h=AiR?9Q79 /ni=4fift4302 /ua=Hs. 123370 /len=562' 


Cluster Incl. AI916544:wa26h03.x1 Homo sapiens cDNA, 3 end /clone=IMAGE-2299253 /clone_end-4 
/nh=AIQ1fiR44 /ni=5636399 /ua=Hs.1 58549 /len=475' 


Cluster Incl. AI919493:tp22a03.x1 Homo sapiens cDNA, 3 end /clone-IMAGE-21884yz /cione_ena-,5 
/nh— aiqiqaqs /ni=fifi3M48 /ua=Hs.2 12925 /len=320 f 


Cluster Incl. AI681180:tx44h02.x1 Homo sapiens cDNA, 3 end /clone-IMAGE-22724b/r /cione_ena-o 
/nh-Aiftfti iftn /ni=48Ql362 /ua=Hs.1 89394 /len=489' 


Cluster Incl. AA81 1371 :6b82b10.s1 Homo sapiens cDNA /clone=IMAGE-1 337851 /gb=AA81 13H 
/ni=9ftRnQfto/, jn =Hs.1 23362 /len=482 _ 


Cluster Incl. AW020375:df08h01 y1 Homo sapiens cDNA, 5 end /clone=IMAGE-2483185 /clone_ena-i> 
/ n K-AW0?^7R /ni-5873905 /uq=Hs.238653 /len=339* 


Cluster Incl. AI683864:tw54a08.x1 Homo sapiens cDNA, 3 end /clone-IMAGE-22W4^ /cione^ena-^ 
/nh-/MfiR3RR4 /ni=4RQAfUfi /.in=Hs 149264 /len=489* 


Cluster Incl. AI888991 :wj16b04.x1 Homo sapiens cDNA, 3 end /clone=IMAGE-2402959 /clone_end-^ 
/nH— AIRRRQQ1 /ni=55<M1 55 /ua=Hs.204044 /len=528 f 


Cluster Incl. AI810266:wb86h07.x1 Homo sapiens cDNA, 3 end /clone=IMAGE-2312605 /clone_end-ci 
/nh-Aiftmofifi /ni=5396832 /ua=Hs.130853 /len=553' 


Cluster Incl. AI916889:wb46g08.x1 Homo sapiens cDNA, 3 end /clone-IMAGE-2308766 /clone_ena-«5 
/gb=AI916889 /gi=5636744 /ug=Hs.213436 /len=480' 
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P-value 
1 


0.009676 


0.008916 


0.0046 


0.0109 


0.000589 


0.003636 


0.00589 


0.004505 


0.042106 


0.001602 


0.028338 


0.010363 


0.020149 


35 

o 

3) 

o 
o 
=> 


0.009654 


Fold Change 


2.999958702 


0.329081864 


0.326191217 


0.325630713 


0.324150837 


0.323684024 


0.320952495 


0.320408458 


0.320177197 


0.320113508 


0.31964593 


0.318766299 


0.315917401 
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0.313783504 


Table 6. U95JE Fold Change Genes (>3 over-expressed in Barrett's 
associated esophageal adenocarcinoma (BA), <0.33 under-expressed in BA) 

Affi/ in Rpnfl Name r 


Cluster Incl. AI052526:oz27f09.x1 Homo sapiens cDNA, 3 end /clone=lMAGE-1 676585 /cione_ena-j j 
^nh-AI0R?52fi /ni=330ft517 io=Hs.233871 /len=290' 


Cluster Incl. AI985612:wr75d05.x1 Homo sapiens cDNA, 3 end /clone=IMAGE-2493513 /cione_ena-d 
Inh-MWftl? /ni=RR19ft8Q /uq=Hs.1 84484 /len=486' 


Cluster Incl. AI434477:ti37d01 .x1 Homo sapiens cDNA, 3 end /clone=IMAGE-21 32641 /clone_ena-4 
/nh-AI434477 /ai=4296452 /uq=Hs.210531 /len=421 1 


Cluster Incl. AA813527:ai67f1 1 .s1 Homo sapiens cDNA, 3 end /clone=1375917 /clone_eno=d' /gD=Aaoioo^ 
/ni=9ftft9919 /nn=Hs.122814 /len=459' - 


Cluster Incl. AI168188:oo09g11.x1 Homo sapiens cDNA, 3 end /clone=IMAGE-l5bbMZ /cione_end-,i 
frih-AI16ft18R /ni=37M358 /.in=Hs 225023 /len=448' 


I Cluster Incl. AI093188:qa98b05.x1 Homo sapiens cDNA, 3 end /clone=IMAGE-1 694769 /cione_end-a 
/nh-AI09^1«R /ni=3432164 /uq=Hs.215319 /len=47V 


Cluster Incl. R19892:yg38f12.r1 Homo sapiens cDNA, 5 end /clone=IMAGE-34798 /clone__end=5 /gD-myayz 
/ni-774fr?6 h m=u* 9ni45fi /len=503' 


X00351 Human mRNA for beta-actin (_5, _M, _3 represent transcript regions 5 pnme, Miaaie, ana s prime 

rAsnentivfiM . 


Cluster Incl. AA814901:oa75g08.s1 Homo sapiens cDNA/clone=IMAGE-1318142/gb=AAcii4yui 
/ n j-?ftA44Q7 /. in=H«5 994495 /len=459 — — 


Cluster Incl. AA31 3781 :EST1 85644 Homo sapiens cDNA, 5 end /clone=ATCC-1 09963 /clone_ena-b 
/nh-AA3i 37ft1 /ai=1 9661 1 0 /ua=Hs.236903 /len=599' 


Cluster incl. N23781:yx35e09.r1 Homo sapiens cDNA, 5 end /clone=IMAGE-263752 /clone_end-S 
/nh-N?^7fti /ni-1 137931 /uq=Hs.226614/len=592' 


Cluster Incl. AA779712:af43h05:s1 Homo sapiens cDNA, 3 end /clone=1 0344b f /cione_ena-J 
/nh-AA77P7P /ni=:9fi3P.n43 /uq=Hs.20871 8 /len=606' 


Cluster Incl. AI684559.-wa84a03.x1 Homo sapiens cDNA, 3 end /clone-IMAGc:-23U2^o /cione_end-o 
/ n b=AI684™ /ni=4895853 /uq=Hs.201637 /len=507' 
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Cluster Incl. AI525044:promma-5.C09.r Homo sapiens cDNA, 5 end /clone_end=5' /gb-A!525044 
/gi=4439179 /ug=Hs.168007 /len=639' 
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P-value 


0.008401 


0.03165 


0.014896 


0.007489 


0.040979 


0.000087 


0.000765 


0.000025 


0.006092 


0.027087 


0.000743 


0.000025 


0.033101 


0.00007 


Fold Change 


0.312957775 


0.312853457 


0.312511549 


0.312460844 


0.31169073 


0.311634623 


0.310115615 


0.309511353. 


0.307974308 


0.30729455 


0.304600093 


0.303909365 


0.303863976 


0.30383599 


Table 6. U95_E Fold Change Genes (>3 over-expressed in Barrett's 
associated esophageal adenocarcinoma (BA), «U3 under-expressed in BA) 

Aff«, in ftono Name ■ 


Cluster Incl. AI561042:tq29e02.x1 Homo sapiens cDNA, 3 end /clone=IMAGE-22lUza4 /cione_end 6 
/„h=/w*fiinA9 /ni=4fi1 1383 /ua=Hs.239771 /len=635' 


Cluster Incl. AI243125:qh26h01.x1 Homo sapiens cDNA, 3 end /clone=IMAGE-1 845841 /cione_ena-o 
/„k-aioa<m 9* /ni=3R3Rfi?? /ua=Hs.182947 /len=388' _J 


Cluster Incl. AI858718:w!41f12.x1 Homo sapiens cDNA, 3 end /clone=IMAGE-2427503 /clone_ena J 
/„h=AiR*R7lR /ni=?W12334 /ua=Hs.226562 /len=654' . _ 


Cluster Incl. AW015534:lll-H-BIOp-aau-b-12-0-ULs1 Homo sapiens cDNA, 3 end /clone=iMA^t-znuouo 
/Hnne end-*' /nh= awm RS34 /qi=5864281 /ua=Hs.234248 /len=51 9' 


Cluster Incl. AI732539:ni01f09.x5 Homo sapiens cDNA, 3 end /clone-IMAGt-SbbCbi /cione_end-3 
/nH-Ai739R3Q /ni=sn53fi52 /ua=Hs.180142 /len=508' 


Cluster Incl. R35259:yg61b10.r1 Homo sapiens cDNA, 5 end /done-lMAGE-37315 /cione_ena-o 
/«h=D^9RQ/^i=7Q01fin/ua=Hs.213548/len=513 , _ — ; , 


Cluster Incl. AI808615:wf56f01 .x1 Homo sapiens cDNA, 3 end /clone=IMAGE-23bybuy /cione_end-4 
/nh=Aiflnflfii* /ni=53C)5181 /ua=Hs.202625 /len=443' 


Cluster Incl. AI420234:te98b01 .x1 Homo sapiens cDNA, 3 end /clone=IMAGt^uy4t>y i /clone_end-4 
/ n h-AI4?0?34 /ni=49fifil65 /uq=Hs.163645 /len=501' 1 


Cluster Incl. AI935292:wp16e07.x1 Homo sapiens cDNA, 3 end /done=IMAGE-24b&u/o /cione_end-4 
inh-Aicnmcp /ni=RR74162 /ua=Hs.153408 /len=573' 


Cluster Incl. AA702143:zi85h05.s1 Homo sapiens cDNA, 3 end /clone-447609 /clone_ena-^ /goww i*o 
/ni=97nf;9RR A in=Hs 190365 /len=447* 


Cluster Incl. AI681868:tx50a12.x1 Homo sapiens cDNA, 3 end 7clone=IMAGE-22729yu /cione_end-4 
/nh=Alfift1 RRR /ni=4892050 /ua=Hs.1 78784 /len=562' 


Cluster Ind. AI023259:ov64g12.s1 Homo sapiens cDNA, 3 end /done-IMAGE-1642150 /clone_ena-o 

fnh-MWWW /ni-3938500 /uq=Hs.215260 /len=50r 


Cluster Incl. N73382:EST55b03 Homo sapiens cDNA /clone=55b03 /gb-N73382 /gi=i**U4yu /ug-HS. i*o .ho 

/|pn=398 


Cluster Incl. AI653380:wb45d09.x1 Homo sapiens cDNA, 3 end /clone=IMAGE-2308625 /clone_ena-J 
/gb=AI653380 /gi=4737359 /ug=Hs.1 33081 /len=517' 
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Table 8. Genes identified by heirarchical clustering of 
the full Human Genome U95 set showing fold change between normal and diseased 

sample sets. 



ACCESSION 


FOLD 

/*HTT A "XT/** *T? 

CHANGE 


XT A \A V 

JN/VJVLIS 


AI199897 


-22.0 


EST 


W69365 


-8,2 


EST 


AI081571 


-8.9 


EST 


AI962905 


-9.6 


CGI- 119 PROTEIN 


AI885390 


-8.6 


EST 


X94323 


-11.2 


SPECIFIC (jRAJNULb rKUlJbliN 


M62982 


-11.4 


ARACMJLDUMAlli lZ-LU'UAYLrHiNASJi 


X87159 


-6.5 


SODIUM CHANNEL (bLNJNlb) 


M32402 


-9.7 


PLACENTA PROTEIN 1 1 


AI739630 


-19.7 


EST 


AW025309 


-40.5 


EST 


AB001325 


-20.2 


AQUAPORIN 3 


AI582193 


-35.1 


EST 


AB002134 


-12.1 


AIRWAY TKYPSIN-LIKE PROIEAaJi 


AL050220 


-17.6 


TT ATT TT7"nT)T\T 1 ^ 

KALLIKREIN 13 ! 


AI971202 


-12.2 


EST 


AI669212 


-14.3 


EST 


AI916261 


-9.2 


EST 


M24902 


-1.0 


ACID PHOSPHATASE PRUo I Al L 


U83115 


-8.1 


A tx f 1 / K Y\ CI TTX 1 ' 1 * TXT A /TT7T A "X.T/""V\ K A \ 

ALM1 (ABSENT IN MELANOMA) 


Y09538 


-17.3 


ZINC FINGER PRO I JbUN 1 o5 


AI142832 


-15.4 


EST 


Y16961 


-12.6 


TUMOR PROTEIN P63 


AA1 30221 


! -22.4 


EST 


AA781220 


-19.0 


PAIRED BOX GENE 9 (PAX9) 


AI282714 


-77.6 


DESCI PROTEIN 


M98477 


-22.2 


TRANSGLUTAMINASE 1 


AI540870 


-27.6 


EST 


AF045941 


-24.5 


SCIELLIN 


R37637 


-44.5 


EST 


AA743820 


-49.5 


EST 


AI623978 


-03.1 


ESI 


AI859619 


-15.9 


EST 


AJ223693 


-10.5 


GPI-ANCHORED HOMOLOG 


AI378979 


-37.1 


KATANIN P60 SUBUNIT Al 


| S66896 


-14.2 


SCCAI 


W68630 


-28.5 


EST 


AA401397 


. -16.4 


KATANIN 13 


AA010777 


-35.7 


G ALECTIN 7 


L10386 


-25.1 


TRANSGLUTAMINASE 3 


A1814274 


-123.8 


EST 


~ X76342 


-20.7 


ALCOHOL DEHYDROGENASE 7 
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AI692575 


-46.0 


EST 


ACCESSION 


FOLD 
CHANGE 


XTAA/TR 
rSAlYLCi 


T71258 


-23.9 


ft 


AI265958 


-58.2 


EST 


M13903 


-15.7 


ESTVOLUCRIN 


M60047 


-20.7 


HBP17 


X99977 


-35.3 


ARS GENE, COMPONENT B 


AI369347 


-7.9 


EST 


A1052020 


-7.4 


EST 


AI818579 


2.5 


EST 


AI557210 


4.7 


EST 


AI754693 


4.1 


EST 


AB029000 


8.3 


MRNA FOR KIAA1077 PROTEIN 


U09278 


7.9 


FIBROBLAST ACTIVATION PROTEIN, ALPHA 


AA044844 


3.4 


SOLUTE CARRIER FAMILY 11, JVliiJVLBliK J 


J04162 


3.1 


Fc FRAGMENT OF IgC, LOW AFFINITY 111A 


AA147884 


5.0 


EST 


AA584310 


13.2 


CGI-101 PROTEIN 


D21254 


3.8 


CADHERIN-11 


D21255 


3.9 


CADHERIN-11 


X82153 


2.4 


CATHEPSIN K ! 


AA127736 


3.6 


COLLAGEN, TYPE V, ALPHA 2 


AW007442 


6.4 


BIGLYCAN 


Z37976 


2.6 


LATENT TGFB BINDING PROIEIM 2 


AI686894 


3.5 


EST 


AA007367 


6.6 


EST 


AF052124 


9.5 


SPPI (OSTEOPONTIN) 


AA088177 


4.3 


EST 


J04765 


5.4 


SPPI (OSTEOPONTIN) 


AA447232 . 


4.3 


CATHEPSIN B 


X15998 


4.3 


CHONDROITESf SULFATE PROTEOGLYCAN 2 


AA426499 


3.8 


CHONDROITBSf SULFATE PROTEOGLYCAN 2 


X15998 


2.9 


CHONDROITIN SULFATE PROTEOGLYCAN 2 


AA704137 


5.4 


THY-1 CELL SURFACE ANTIGEN 


AI740961 


5.7 


rial 


AI864014 


17.2 


SPPI (OSTEOPONTIN) 


D13666 


5.0 


OSTEOBLAST SPECIFIC FACTOR 2 


AL050137 


3.1 


EST 


AI970896 


3.1 


EST 


AF020044 


2.0 


STEM CELL GROWTH FACTOR 


AI333224 


3.0 


EST 


AI091277 


3.9 


EST 


W74476 


4.7 


EST 


AA056278 


5.3 


EHM2 GENE 
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sample sets. 



AA877900 




WVPDTTTFTTCAT PROTEIN FLJ20063 


AI799626 


1U.Z 


T7QT 


ACCESSION 


T?/"\T "P\ 

rULU 


NAME 


M35252 




TP A/TRR ANF 4 SUPERF AMILY MEMBER 3 


AI961220 


on 1 

3U.1 


qpptmp PPOTP A^IF TNRTBTTOR (SPINKl) 

DJC»XxXiNxl X ISXJ 1 D/TOiv XXlXXXJJXX WXV N UX XA 


AI148745 


9.6 




AI982768 


5.7 


"OCT 

Hoi 


H30385 


A O 

4.o 


T3CT 


AB023171 


10. o 


\yTDXTA TTrVB VTA A 00^ J. PPOTFTN 


AW007803 


7 A 


TJCT 
Hoi 


AA458524 


7.5 


"OCT 


AI301060 


A A 

9.0 


TJCT 


AI859849 


16.2 


"DCT 


AA156240 


C A 

5.9 


CT7T? TMPf PPfYTFAW TTMRT TPAL ENDOT1TCLIUM 

o Lj tvli N C rX\UllvAOivj U lvXl->XjXV^/-rTLXj ijiiA/vyAiAVAviviTx 


AT6Q1066 


15.8 


EST 


AB006781 


20.9 


GALECTIN 4 


AA535447 


18.6 


EST 


AI125252 


15.9 


EST 


AI308063 


126.9 


EST 


U73167 


4.3 


COSMID CLONE 


N30008 


3.8 


EST 


AI392817 


5.6 


HEPATOCYTE NUCLEAR FACTOR 3 GAMMA 


AB018335 


5.2 


MRNA FOR KIAA0792 PROTEIN 


AF065388 


5.3 


TETRASPAN-1 
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What is claimed is: 

1 . A method of diagnosing esophageal cancer in a patient, comprising: 

(a) detecting the level of expression in a tissue sample of two or more genes from 
5 Tables 2-8; wherein differential expression of the genes in Tables 2-8 is indicative of 
esophageal cancer. 

2. A method of detecting the progression of esophageal cancer in a patient, 
comprising: 

10 (a) detecting the level of expression in a tissue sample of two or more genes from 

Tables 2-8; wherein differential expression of the genes in Tables 2-8 is indicative of 
esophageal cancer progression. 

3. A method according to claim 2, wherein the progression is the progression of 
• 15 Barrett's esophagus to adenocarcinoma. 

4. A method of monitoring the treatment of a patient with esophageal cancer, 
comprising: 

(a) administering a pharmaceutical composition to the patient; 
20 (b) preparing a gene expression profile from a cell or tissue sample from the patient; 

and 

(c) comparing the patient gene expression profile to a gene expression from a cell 
population selected from the group consisting of normal esophageal cells, cells from Barrett's 
esophagus and esophageal adenocarcinoma cells. 

25 

5 . A method of treating a patient with esophageal cancer, comprising: 

(a) administering to the patient a pharmaceutical composition, wherein the 
composition alters the expression of at least one gene in Tables 2-8; 

(b) preparing a gene expression profile from a cell or tissue sample from the patient 
30 comprising esophageal cancer cells; and 
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(c) comparing the patient expression profile to a gene expression profile selected from 
the group consisting of normal esophageal cells, cells from Barrett's esophagus and 
esophageal adenocarcinoma cells. 

5 6. A method of diagnosing esophageal adenocarcinoma in a patient, comprising: 

(a) detecting the level of expression in a tissue sample of two or more genes from 
Tables 2-8; wherein differential expression of the genes in Tables 2-8 is indicative of 
esophageal adenocarcinoma. 

10 7. A method of detecting the progression of esophageal adenocarcinoma in a patient, 

comprising: 

(a) detecting the level of expression in a tissue sample of two or more genes from 
Tables 2-8; wherein differential expression of the genes in Tables 2-8 is indicative of 
esophageal adenocarcinoma progression. 

15 

8. A method of monitoring the treatment of a patient with esophageal 
adenocarcinoma, comprising: 

(a) administering a pharmaceutical composition to the patient; 

(b) preparing a gene expression profile from a cell or tissue sample from the patient; 

20 and 

(c) comparing the patient gene expression profile to a gene expression from a cell 
population comprising normal esophageal cells or to a gene expression profile from a cell 
population comprising esophageal adenocarcinoma cells or to both. 

25 9. A method of treating a patient with esophageal adenocarcinoma, comprising: 

(a) administering to the patient a pharmaceutical composition, wherein the 
composition alters the expression of at least one gene in Tables 2-8; 

(b) preparing a gene expression profile from a cell or tissue sample from the patient 
comprising esophageal adenocarcinoma cells; and 

30 (c) comparing the patient expression profile to a gene expression profile from an 

untreated cell population comprising esophageal adenocarcinoma cells. 
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10. A method of screening for an agent capable of modulating the onset or 
progression of esophageal cancer, comprising: 

(a) preparing a first gene expression profile of a cell population comprising 
esophageal cancer cells, wherein the expression profile determines the expression level of one 

5 or more genes from Tables 2-8; 

(b) exposing the cell population to the agent; 

(c) preparing second gene expression profile of the agent-exposed cell population; and 

(d) comparing the first and second gene expression profiles. 

10 11. The method of claim 14, wherein the esophageal cancer is a esophageal 

adenocarcinoma. 

12. A composition comprising at least two oligonucleotides, wherein each of the 
oligonucleotides comprises a sequence that specifically hybridizes to a gene in Tables 2-8. 

15 

13. A composition according to claim 12, wherein the composition comprises at least 
3 oligonucleotides. 

14. A composition according to claim 12, wherein the composition comprises at least 
20 5 oligonucleotides. 

15. A composition according to claim 12, wherein the composition comprises at least 
7 oligonucleotides. 

25 16. A composition according to claim 12, wherein the composition comprises at least 

1 0 oligonucleotides. 

17. A composition according to any one of claims 12-16, wherein the 
oligonucleotides are attached to a solid support. 

30 
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18. A composition according to claim 17, wherein the solid support is selected from a 
group consisting of a membrane, a glass support, a filter, a tissue culture dish, a polymeric 
material, a bead and a silica support. 

19. A solid support comprising at least two oligonucleotides, wherein each of the 
oligonucleotides comprises a sequence that specifically hybridizes to a gene in Tables 2-8. 

20. A solid support according to claim 19, wherein the oligonucleotides are 
covalently attached to the solid support. 

21 . A solid support according to claim 20, wherein the oligonucleotides are non- 
covalently attached to the solid support. 

22. A solid support according to claim 19, wherein the support comprises at least 
about 10 different oligonucleotides in discrete locations per square centimeter. 

23. A solid support according to claim 19, wherein the support comprises at least 
about 100 different oligonucleotides in discrete locations per square centimeter. 

24. A solid support according to claim 19, wherein the support comprises at least 
about 1000 different oligonucleotides in discrete locations per square centimeter. 

25. A solid support according to claim 19, wherein the support comprises at least 
about 10,000 different oligonucleotides in discrete locations per square centimeter. 

26. A computer system comprising: 

(a) a database containing information identifying the expression level in esophageal 
tissue of a set of genes comprising at least two genes in Tables 2-8; and 

(b) a user interface to view the information. 

26. A computer system of claim 25, wherein the database further comprises sequence 
information for the genes. 
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27. A computer system of claim 25, wherein the database further comprises 
information identifying the expression level for the set of genes in normal esophageal tissue. 

5 28. A computer system of claim 25, wherein the database further comprises 

information identifying the expression level of the set of genes in esophageal cancer tissue. 

29. A computer system of claim 28, wherein the esophageal cancer tissue comprises 
esophageal adenocarcinoma cells. 

10 

30. A computer system of claim 31-36, further comprising records including 
descriptive information from an external database, which information correlates said genes to 
records in the external database. 

15 31. A computer system of claim 30, wherein the external database is GenBank. 

32. A method of using a computer system of any one of claims 26-29 to present 
information identifying the expression level in a tissue or cell of at least one gene in Tables 2- 
8, comprising: 

20 (a) comparing the expression level of at least one gene in Tables 2-8 in the tissue or 

cell to the level of expression of the gene in the database. 

33. A method of claim 32, wherein the expression level of at least two genes are 
compared. 

25 

34. A method of claim 32, wherein the expression level of at least five genes are 
compared. 



30 



35. 

compared. 



A method of claim 32, wherein the expression level of at least ten genes are 
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36. A method of claim 32, further comprising displaying the level of expression of at 
least one gene in the tissue or cell sample compared to the expression level in esophageal 
cancer. 
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AMENDED CLAIMS 

[received by the International Bureau on 2 October 2001 (02.10.01); 
original claims 11,2) and 27-36 amended: remaining claims unchanged (4 pages)] 

10. A method of screening for an agent capable of modulating the onset or 
progression of esophageal cancer, comprising: 

(a) preparing a first gene expression profile of a cell population comprising 
esophageal cancer cells, wherein the expression profile determines the expression level of one 
or more genes from Tables 2-8; 

(b) exposing the cell population to the agent; 

(c) preparing second gene expression profile of the agent-exposed cell population; and 

(d) comparing the first and second gene expression profiles. 

1 1 . The method of claim 1 0, wherein the esophageal cancer is a esophageal 
adenocarcinoma. 

12. A composition comprising at least two oligonucleotides, wherein each of the 
oligonucleotides comprises a sequence that specifically hybridizes to a gene in Tables 2-8. 

13. A composition according to claim 12, wherein the composition comprises at least 
3 oligonucleotides. 

14. A composition according to claim 12, wherein the composition comprises at least 
5 oligonucleotides. 

15. A composition according to claim 12, wherein the composition comprises at least 
7 oligonucleotides. 

16. A composition according to claim 12, wherein the composition comprises at least 
10 oligonucleotides. 

17. A composition according to any one of claims 12-16, wherein the 
oligonucleotides are attached to a solid support. 
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1 8. A composition according to claim 17, wherein the solid support is selected from a 
group consisting of a membrane, a glass support, a filter, a tissue culture dish, a polymeric 
material, a bead and a silica support, 

19. A solid support comprising at least two oligonucleotides, wherein each of the 
oligonucleotides comprises a sequence that specifically hybridizes to a gene in Tables 2-8. 

20. A solid support according to claim 19, wherein the oligonucleotides are 
covalently attached to the solid support. 

21 . A solid support according to claim 19, wherein the oligonucleotides are non- 
covalently attached to the solid support* 

22. A solid support according to claim 19, wherein the support comprises at least 
about 10 different oligonucleotides in discrete locations per square centimeter. 

23. A solid support according to claim 19, wherein the support comprises at least 
about 100 different oligonucleotides in discrete locations per square centimeter. 

24. A solid support according to claim 19, wherein the support comprises at least 
about 1000 different oligonucleotides in discrete locations per square centimeter. 

25. A solid support according to claim 19, wherein the support comprises at least 
about 10,000 different oligonucleotides in discrete locations per square centimeter. 

26. A computer system comprising: 

(a) a database containing information identifying the expression level in esophageal 
tissue of a set of genes comprising at least two genes in Tables 2-8; and 

(b) a user interface to view the information. 

27. A computer system of claim 26, wherein the database Author comprises sequence 
information for the genes. 
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28. A computer system of claim 26, wherein the database further comprises 
information identifying the expression level for the set of genes in normal esophageal tissue. 

29. A computer system of claim 26, wherein the database further comprises 
information identifying the expression level of the set of genes in esophageal cancer tissue. 

30. A computer system of claim 29, wherein the esophageal cancer tissue comprises 
esophageal adenocarcinoma cells. 

31. A computer system of claim 26-30, further comprising records including 
descriptive information from an external database, which information correlates said genes to 
records in the external database. 

32. A computer system of claim 3 1 , wherein the external database is GenBank- 

33. A method of using a computer system of any one of claims 26-30 to present 
information identifying the expression level in a tissue or cell of at least one gene in Tables 2- 
8, comprising: 

(a) comparing the expression level of at least one gene in Tables 2-8 in the tissue or 
cell to the level of expression of the gene in the database. 

34. A method of claim 33, wherein the expression level of at least two genes are 
compared. 

35. A method of claim 33, wherein the expression level of at least five genes arc 
compared. 

36. A method of claim 33, wherein the expression level of at least ten genes are 
compared. 
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37. A method of claim 33, further comprising displaying the level of expression of at 
least one gene in the tissue or cell sample compared to the expression level in esophageal 
cancer. 
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STATEMENT UNDER PCT ARTICLE 19(1) 

Applicants herewith submit replacement sheets numbered 158-161 to replace sheets 
numbered 158-161 as originally filed on 28 March 2001 at the U.S. Receiving Office 
(RO/US). 

With respect to each claim appearing in the international application based on the 

replacement sheets submitted herewith, and in accordance with PCT Section 205, the 

following claims are: 

unchanged: rh»m^ 10 12-77 18-20. and 22-26 

canceled: claim(s) none 

new: claim(s)nant 

Applicants have amended claims 1 1, 21, and 27*36 to correct the claim numbering 

and the dependencies of the renumbered claims. The amendment does not go beyond the 

disclosure as originally filed 

Applicants request entry of the amendment and publication of PCT/US01/09847 with 

the amended claims. 
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